Custom Functions

source("custom_functions.R")

2. Hypothesis: 2. PSC recurrence is associated with a specific composition of the gut microbiome

rPSC vs non-rPSC vs Healthy analysis on merged data

rPSC vs non-rPSC –by this comparison, we will find the effect of recurrence (whether the microbial composition is associated with it)

non-rPSC vs Healthy – by this comparison, we will find if the non-rPSC samples are „closer“ to the healthy samples then rPSC samples in terms of the microbial composition. In other words, does non-rPSC samples have „healthy microbiome“?

rPSC vs Healthy – by this comparison, we will find how much are the rPSC samples different from healthy samples.

However, the rPSC and non-rPSC samples are not different at all (see results). Therefore, we can do EXCLUSIVE LEFT JOIN of differentially abundant taxa from the analyses (rPSC vs Healthy, non-rPSC vs Healthy)

Data Import

Importing ASV, taxa and metadata tables for both czech and norway samples.

Data Import

Importing ASV, taxa and metadata tables for both Czech and Norway samples.

Czech

path = "../../data/analysis_ready_data/ikem/"
asv_tab_ikem <- as.data.frame(fread(file.path(path,"asv_table_ikem.csv"),
                                    check.names = FALSE))
taxa_tab_ikem <- as.data.frame(fread(file.path(path,"taxa_table_ikem.csv"),
                                     check.names = FALSE))
metadata_ikem <- as.data.frame(fread(file.path(path,"metadata_ikem.csv"),
                                     check.names = FALSE))

Norway

path = "../../data/analysis_ready_data/norway/"
asv_tab_norway <- as.data.frame(fread(file.path(path,"asv_table_norway.csv"),
                                    check.names = FALSE))
taxa_tab_norway <- as.data.frame(fread(file.path(path,"taxa_table_norway.csv"),
                                    check.names = FALSE))
metadata_norway <- as.data.frame(fread(file.path(path,"metadata_norway.csv"),
                                    check.names = FALSE))

Merging data

Merging two countries to create whole dataset

asv_tab <- merge(asv_tab_ikem,asv_tab_norway,by="SeqID",all=TRUE)
taxa_tab <- merging_taxa_tables(taxa_tab_ikem,taxa_tab_norway)
Merging at ASV level
Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment

Merging two countries based on the different matrices - Ileum, Colon.

Terminal ileum

ileum_data <- merging_data(asv_tab_1=asv_tab_ikem,
                           asv_tab_2=asv_tab_norway,
                           taxa_tab_1=taxa_tab_ikem,
                           taxa_tab_2=taxa_tab_norway,
                           metadata_1=metadata_ikem,
                           metadata_2=metadata_norway,
                           segment="TI",Q="Q2")
Removing 1498 ASV(s)
Removing 1834 ASV(s)
Merging at ASV level
Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment
Removing 641 ASV(s)
ileum_asv_tab <- ileum_data[[1]]
ileum_taxa_tab <- ileum_data[[2]]
ileum_metadata <- ileum_data[[3]]

Colon

colon_data <- merging_data(asv_tab_1=asv_tab_ikem,
                           asv_tab_2=asv_tab_norway,
                           taxa_tab_1=taxa_tab_ikem,
                           taxa_tab_2=taxa_tab_norway,
                           metadata_1=metadata_ikem,
                           metadata_2=metadata_norway,
                           segment="colon",Q="Q2")
Removing 739 ASV(s)
Removing 266 ASV(s)
Merging at ASV level
Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment
Removing 1096 ASV(s)
colon_asv_tab <- colon_data[[1]]
colon_taxa_tab <- colon_data[[2]]
colon_metadata <- colon_data[[3]]

Data Analysis - Terminal ileum

segment="terminal_ileum"

Filtering

Rules: - prevalence > 5% (per group) - nearZeroVar with default settings - sequencing depth > 5000 - taxonomic assignment at least order

Rarefaction Curve

path="../intermediate_files/rarecurves"
seq_depth_threshold <- 10000
ps <- construct_phyloseq(ileum_asv_tab,ileum_taxa_tab,ileum_metadata)
rareres <- get_rarecurve(obj=ps, chunks=500)
save(rareres,file = file.path(path,"rarefaction_ileum.Rdata"))
load(file.path(path,"rarefaction_ileum.Rdata"))

prare <- ggrarecurve(obj=rareres,
                      factorNames="Country",
                      indexNames=c("Observe")) + 
  theme_bw() +
  theme(axis.text=element_text(size=8), panel.grid=element_blank(),
        strip.background = element_rect(colour=NA,fill="grey"),
        strip.text.x = element_text(face="bold")) + 
  geom_vline(xintercept = seq_depth_threshold, 
             linetype="dashed", 
             color = "red") + 
  xlim(0, 20000)
Warning: NaNs producedThe color has been set automatically, you can reset it manually by adding scale_color_manual(values=yourcolors)
prare

Library size

read_counts(ileum_asv_tab, line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

Sequencing depth

data_filt <- seq_depth_filtering(ileum_asv_tab,
                                 ileum_taxa_tab,
                                 ileum_metadata,
                                 seq_depth_threshold = 10000)
Removing 104 ASV(s)
filt_ileum_asv_tab <- data_filt[[1]]; alpha_ileum_asv_tab <- filt_ileum_asv_tab
filt_ileum_taxa_tab <- data_filt[[2]]; alpha_ileum_taxa_tab <- filt_ileum_taxa_tab
filt_ileum_metadata <- data_filt[[3]]; alpha_ileum_metadata <- filt_ileum_metadata

seq_step <- dim(filt_ileum_asv_tab)[1]

Library size

read_counts(filt_ileum_asv_tab,line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

NearZeroVar

data_filt <- nearzerovar_filtering(filt_ileum_asv_tab, 
                                   filt_ileum_taxa_tab,
                                   filt_ileum_metadata)

filt_ileum_asv_tab <- data_filt[[1]]
filt_ileum_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_ileum_asv_tab)[1]

Library size

read_counts(filt_ileum_asv_tab,line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

Final Counts

final_counts_filtering(ileum_asv_tab,
                       filt_ileum_asv_tab,
                       filt_ileum_metadata,
                       seq_step, 0, nearzero_step) %>% `colnames<-`("Count")

Alpha diversity

path = "../results/Q2/alpha_diversity"

Calculation

# Construct MPSE object
alpha_ileum_metadata$Sample <- alpha_ileum_metadata$SampleID
ileum_mpse <- as.MPSE(construct_phyloseq(alpha_ileum_asv_tab,
                                         alpha_ileum_taxa_tab,
                                         alpha_ileum_metadata))

ileum_mpse %<>% mp_rrarefy(raresize = 10000,seed = 123)

# Calculate alpha diversity - rarefied counts
ileum_mpse %<>% mp_cal_alpha(.abundance=RareAbundance, force=TRUE)

alpha_div_plots <- list()

# preparing data frame
alpha_data <- data.frame(SampleID=ileum_mpse$Sample.x,
                         Observe=ileum_mpse$Observe,
                         Shannon=ileum_mpse$Shannon,
                         Simpson=ileum_mpse$Simpson,
                         Pielou=ileum_mpse$Pielou,
                         Group=ileum_mpse$Group,
                         Country=ileum_mpse$Country,
                         Patient=ileum_mpse$Patient)

write.csv(alpha_data,file.path(path,paste0("alpha_indices_",segment,".csv")),
          row.names = FALSE)

Plots

Country plot

p_boxplot_alpha <- alpha_diversity_countries(alpha_data)
Using SampleID, Group, Country, Patient as id variables
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.Using SampleID, Group, Country, Patient as id variables
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
# save the results
alpha_div_plots[[paste(segment,"Country")]] <- p_boxplot_alpha

# see the results
p_boxplot_alpha

pdf("../figures/Q2/alpha_diversity_terminal_ileum.pdf",
    height =4,width = 7)
p_boxplot_alpha
dev.off()

Custom plot

alpha_data <- alpha_data %>% 
  dplyr::select(-c("Simpson","Pielou")) %>%
  mutate(Richness=Observe)

p_A <- alpha_diversity_custom_2(alpha_data,
                                size = 1.5,
                                width = 0.3)
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
# save the results
alpha_div_plots[[paste(segment,"Custom")]] <- p_A

p_A

Linear Model

path = "../results/Q2/alpha_diversity"
alpha_data <- read.csv(file.path(path,paste0("alpha_indices_",segment,".csv")))

Richness

results_model <- pairwise.lm(formula = "Observe ~ Group * Country",
                             factors=alpha_data$Group,
                             data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_observe <- results_model[[1]]
  results_model_observe_emeans <- results_model[[2]]
} else {
  results_model_observe <- results_model
  results_model_observe_emeans <- NA
}

# save the results
pc_observed <- list(); 
pc_observed[[segment]] <- results_model_observe
# see the results
knitr::kable(results_model_observe,digits = 3,
caption = "Raw results of linear model of richness estimation.")
Raw results of linear model of richness estimation.
Estimate Std..Error t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC -12.199 11.972 -1.019 0.310 0.349
non-rPSC , rPSC - CZ vs NO -21.486 11.452 -1.876 0.063 0.141
non-rPSC vs GrouprPSC:CountryNO 9.347 23.088 0.405 0.686 0.686
healthy vs GrouprPSC -37.741 12.284 -3.072 0.003 0.025 *
healthy , rPSC - CZ vs NO 11.203 10.973 1.021 0.310 0.349
healthy vs GrouprPSC:CountryNO -23.342 21.355 -1.093 0.277 0.349
healthy vs Groupnon-rPSC -25.542 9.322 -2.740 0.007 0.031 *
healthy , non-rPSC - CZ vs NO 11.203 10.933 1.025 0.307 0.349
healthy vs Groupnon-rPSC:CountryNO -32.689 15.108 -2.164 0.032 0.096

knitr::kable(results_model_observe_emeans,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
contrast Country estimate SE df t.ratio p.value
(non-rPSC) - healthy CZ -25.542 9.322 174 -2.740 0.007
(non-rPSC) - healthy NO -58.231 11.889 174 -4.898 0.000

Shannon

results_model <- pairwise.lm(formula = "Shannon ~ Group * Country",
                             factors=alpha_data$Group,
                             data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_shannon <- results_model[[1]]
  results_model_shannon_emeans <- results_model[[2]]
} else {
  results_model_shannon <- results_model
  results_model_shannon_emeans <- NA
}

# save the results
pc_shannon <- list(); 
pc_shannon[[segment]] <- as.data.frame(results_model_shannon)
# see the results
knitr::kable(results_model_shannon,digits = 3,
caption = "Raw results of linear model of Shannon estimation.")
Raw results of linear model of Shannon estimation.
Estimate Std..Error t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC -0.102 0.146 -0.696 0.487 0.731
non-rPSC , rPSC - CZ vs NO -0.349 0.140 -2.500 0.014 0.123
non-rPSC vs GrouprPSC:CountryNO -0.006 0.281 -0.021 0.983 0.983
healthy vs GrouprPSC -0.210 0.140 -1.501 0.137 0.321
healthy , rPSC - CZ vs NO 0.005 0.125 0.043 0.966 0.983
healthy vs GrouprPSC:CountryNO -0.360 0.244 -1.478 0.142 0.321
healthy vs Groupnon-rPSC -0.109 0.115 -0.948 0.344 0.620
healthy , non-rPSC - CZ vs NO 0.005 0.135 0.040 0.968 0.983
healthy vs Groupnon-rPSC:CountryNO -0.354 0.186 -1.905 0.058 0.263

knitr::kable(results_model_shannon_emeans,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
contrast Country estimate SE df t.ratio p.value
(non-rPSC) - healthy CZ -0.109 0.115 174 -0.948 0.344
(non-rPSC) - healthy NO -0.463 0.146 174 -3.164 0.002

Simpson

results_model <- pairwise.lm(formula = "Simpson ~ Group * Country",
                                     factors=alpha_data$Group,
                                     data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_simpson <- results_model[[1]]
  results_model_simpson_emeans <- results_model[[2]]
} else {
  results_model_simpson <- results_model
  results_model_simpson_emeans <- NA
}


# save the results
pc_simpson <- list(); 
pc_simpson[[segment]] <- as.data.frame(results_model_simpson)
# see the results
knitr::kable(results_model_simpson,digits = 3,
caption = "Raw results of linear model of Simpson estimation.")
Raw results of linear model of Simpson estimation.
Estimate Std..Error t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC 0.000 0.023 0.019 0.985 0.985
non-rPSC , rPSC - CZ vs NO -0.034 0.022 -1.573 0.118 0.531
non-rPSC vs GrouprPSC:CountryNO -0.029 0.043 -0.661 0.510 0.767
healthy vs GrouprPSC -0.009 0.018 -0.500 0.618 0.767
healthy , rPSC - CZ vs NO -0.008 0.016 -0.501 0.617 0.767
healthy vs GrouprPSC:CountryNO -0.055 0.031 -1.779 0.078 0.531
healthy vs Groupnon-rPSC -0.009 0.016 -0.564 0.574 0.767
healthy , non-rPSC - CZ vs NO -0.008 0.019 -0.410 0.682 0.767
healthy vs Groupnon-rPSC:CountryNO -0.026 0.027 -0.975 0.331 0.767

knitr::kable(results_model_simpson_emeans,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
contrast Country estimate SE df t.ratio p.value
rPSC - healthy CZ -0.009 0.018 102 -0.500 0.618
rPSC - healthy NO -0.064 0.025 102 -2.526 0.013

Pielou

results_model <- pairwise.lm(formula = "Pielou ~ Group * Country",
                                     factors=alpha_data$Group,
                                     data=alpha_data)

# check interaction

if (!is.data.frame(results_model)){
  results_model_pielou <- results_model[[1]]
  results_model_pielou_emeans <- results_model[[2]]
} else {
  results_model_pielou <- results_model
  results_model_pielou_emeans <- NA
}

# save the results
pc_pielou <- list(); 
pc_pielou[[segment]] <- as.data.frame(results_model_pielou)
# see the results
knitr::kable(results_model_pielou,digits = 3,
caption = "Raw results of linear model of Pielou estimation.")
Raw results of linear model of Pielou estimation.
Estimate Std..Error t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC -0.004 0.022 -0.184 0.854 0.946
non-rPSC , rPSC - CZ vs NO -0.047 0.021 -2.260 0.025 0.229
non-rPSC vs GrouprPSC:CountryNO -0.005 0.042 -0.108 0.914 0.946
healthy vs GrouprPSC -0.001 0.021 -0.067 0.946 0.946
healthy , rPSC - CZ vs NO -0.009 0.018 -0.463 0.645 0.946
healthy vs GrouprPSC:CountryNO -0.043 0.036 -1.195 0.235 0.705
healthy vs Groupnon-rPSC 0.003 0.018 0.146 0.884 0.946
healthy , non-rPSC - CZ vs NO -0.009 0.021 -0.409 0.683 0.946
healthy vs Groupnon-rPSC:CountryNO -0.038 0.029 -1.329 0.185 0.705

knitr::kable(results_model_pielou_emeans,digits = 3,
caption = "Raw results of independent country analysis")

Table: Raw results of independent country analysis

Saving results

alpha_list <- list(
  Richness=pc_observed[[segment]] %>% rownames_to_column("Comparison"),
  Shannon=pc_shannon[[segment]] %>% rownames_to_column("Comparison"),
  Simpson=pc_simpson[[segment]] %>% rownames_to_column("Comparison"),
  Pielou=pc_pielou[[segment]] %>% rownames_to_column("Comparison"))
                   
write.xlsx(alpha_list, 
           file = file.path(path,paste0("alpha_diversity_results_",segment,".xlsx")))

Beta diversity

Calculating Aitchison distance (euclidean distance on clr-transformed data), both at ASV and genus level.

Main analysis - Genus, Aitchison

Genus level, Aitchison distance

level="genus"
path = "../results/Q2/beta_diversity"
pairwise_aitchison_raw <- list()
pca_plots_list <- list()

Aggregation, filtering

# Aggregation
genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level=level,
                             names=TRUE)
# Filtration
filt_data <- filtering_steps(genus_data[[1]],
                             genus_data[[2]],
                             ileum_metadata,
                             seq_depth_threshold=10000)
Removing 2 ASV(s)
filt_ileum_genus_tab <- filt_data[[1]]
filt_ileum_genus_taxa <- filt_data[[2]]
filt_ileum_metadata <- filt_data[[3]]
PERMANOVA
pairwise_df <- filt_ileum_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_ileum_metadata$Group,
                           covariate = filt_ileum_metadata$Country, sim.method = "robust.aitchison", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, interaction = TRUE, sim.method = "robust.aitchison", p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
pairwise_aitchison_raw[[paste(level, segment)]] <- rbind(pp_factor,pp_cov,pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 215.579 1.208 0.009 0.107 0.107
rPSC vs healthy 1 561.696 3.345 0.030 0.001 0.002 **
non-rPSC vs healthy 1 760.242 4.431 0.024 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC , Country 1 601.929 3.372 0.024 0.001 0.001 ***
rPSC vs healthy , Country 1 496.841 2.959 0.027 0.001 0.001 ***
non-rPSC vs healthy , Country 1 624.474 3.640 0.020 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC : Country 1 156.176 0.874 0.006 0.788 0.788
rPSC vs healthy : Country 1 164.743 0.981 0.009 0.464 0.696
non-rPSC vs healthy : Country 1 209.954 1.225 0.007 0.089 0.267

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

if (length(interaction_sig)>0){
  for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_ileum_metadata$Group,
                      covariate = filt_ileum_metadata$Country, 
                      group1 = group1,
                      group2 = group2)
  print(result_list)
}
}
Plots

Custom

p <- pca_plot_custom(filt_ileum_genus_tab,
                                 filt_ileum_genus_taxa,
                                 filt_ileum_metadata,
                                 show_boxplots = TRUE,
                                 variable = "Group", size=3, show_legend=FALSE)

# save the results
pca_plots_list[[paste(segment,level,"custom")]] <- p

# see the results
p

pdf("../figures/Q2/beta_diversity_terminal_ileum.pdf",
    height =5,width = 5)
p
dev.off()

Saving results

write.xlsx(pairwise_aitchison_raw[[paste(level, segment)]], 
           file = file.path(path,
           paste0("beta_diversity_results_", segment,".xlsx")))

Supplementary analysis

supplements_beta <- list()

Genus level

level="genus"
Bray-Curtis

PERMANOVA

pairwise_df <- filt_ileum_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, sim.method = "bray", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, interaction = TRUE, sim.method = "bray", p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("bray",level,segment)]] <- rbind(pp_factor,pp_cov,pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 0.251 1.208 0.008 0.241 0.241
rPSC vs healthy 1 0.813 4.432 0.039 0.001 0.002 **
non-rPSC vs healthy 1 1.241 6.538 0.034 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC , Country 1 1.449 6.976 0.049 0.001 0.001 ***
rPSC vs healthy , Country 1 0.972 5.300 0.047 0.001 0.001 ***
non-rPSC vs healthy , Country 1 1.336 7.038 0.037 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC : Country 1 0.151 0.724 0.005 0.803 0.803
rPSC vs healthy : Country 1 0.287 1.572 0.014 0.067 0.100
non-rPSC vs healthy : Country 1 0.354 1.873 0.010 0.011 0.033 *

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_ileum_metadata$Group,
                      covariate = filt_ileum_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      sim.method = 'bray')
  print(result_list)
}
$`non-rPSC_healthy_CZ`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1   0.5823 0.02622 3.0424  0.001      0.001
Residual 113  21.6268 0.97378                         
Total    114  22.2090 1.00000                         

$`non-rPSC_healthy_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac       1   1.0126 0.08265 5.4958  0.001      0.001
Residual 61  11.2391 0.91735                         
Total    62  12.2517 1.00000                         

$`non-rPSC_CZ_vs_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1   1.0157 0.04597 4.9629  0.001      0.001
Residual 103  21.0797 0.95403                         
Total    104  22.0954 1.00000                         

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs     R2      F Pr(>F) p.adjusted
Cov       1   0.6741 0.0541 4.0607  0.001      0.001
Residual 71  11.7862 0.9459                         
Total    72  12.4603 1.0000                         

Plots

p <- pca_plot_custom(filt_ileum_genus_tab,
                                 filt_ileum_genus_taxa,
                                 filt_ileum_metadata,
                                 measure = "bray",
                                 show_boxplots = TRUE,
                                 variable = "Group", size=3, show_legend=FALSE)

# save the results
supplements_beta[[paste("PCoA bray",level,segment)]] <- p

# see the results
p

Jaccard

PERMANOVA

pairwise_df <- filt_ileum_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, sim.method = "jaccard", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, interaction = TRUE, sim.method = "jaccard", p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("jaccard",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 0.337 1.138 0.008 0.241 0.241
rPSC vs healthy 1 0.887 3.269 0.030 0.001 0.002 **
non-rPSC vs healthy 1 1.349 4.846 0.026 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC , Country 1 1.460 4.927 0.035 0.001 0.001 ***
rPSC vs healthy , Country 1 1.092 4.023 0.036 0.001 0.001 ***
non-rPSC vs healthy , Country 1 1.476 5.300 0.029 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC : Country 1 0.242 0.817 0.006 0.823 0.823
rPSC vs healthy : Country 1 0.342 1.261 0.011 0.118 0.177
non-rPSC vs healthy : Country 1 0.445 1.604 0.009 0.014 0.042 *

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_ileum_metadata$Group,
                      covariate = filt_ileum_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      sim.method = 'jaccard')
  print(result_list)
}
$`non-rPSC_healthy_CZ`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    0.699 0.02174 2.5112  0.001      0.001
Residual 113   31.468 0.97826                         
Total    114   32.167 1.00000                         

$`non-rPSC_healthy_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac       1   1.0948 0.06116 3.9739  0.001      0.001
Residual 61  16.8056 0.93884                         
Total    62  17.9004 1.00000                         

$`non-rPSC_CZ_vs_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1   1.0948 0.03497 3.7326  0.001      0.001
Residual 103  30.2119 0.96503                         
Total    104  31.3067 1.00000                         

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov       1   0.8258 0.04372 3.2463  0.001      0.001
Residual 71  18.0612 0.95628                         
Total    72  18.8870 1.00000                         

Plots

Custom

p <- pca_plot_custom(filt_ileum_genus_tab,
                                 filt_ileum_genus_taxa,
                                 filt_ileum_metadata,
                                 measure = "jaccard",
                                 show_boxplots = TRUE,
                                 variable = "Group", size=3, show_legend=FALSE)

# save the results
supplements_beta[[paste("PCoA jaccard",level,segment)]] <- p

# see the results
p

ASV level

level="ASV"
Aitchison

PERMANOVA

# preparing data frame
pairwise_df <- filt_ileum_asv_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, sim.method = "robust.aitchison", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, interaction = TRUE, sim.method = "robust.aitchison", p.adjust.m="BH")

pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("aitchison",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 316.296 1.176 0.008 0.071 0.071
rPSC vs healthy 1 830.867 2.897 0.027 0.001 0.002 **
non-rPSC vs healthy 1 1156.137 4.137 0.023 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC , Country 1 639.519 2.378 0.017 0.001 0.001 ***
rPSC vs healthy , Country 1 598.496 2.087 0.019 0.001 0.001 ***
non-rPSC vs healthy , Country 1 752.840 2.694 0.015 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC : Country 1 222.179 0.825 0.006 0.960 0.96
rPSC vs healthy : Country 1 262.783 0.916 0.008 0.746 0.96
non-rPSC vs healthy : Country 1 291.928 1.045 0.006 0.333 0.96

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

if (length(interaction_sig)>0){
for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_ileum_metadata$Group,
                      covariate = filt_ileum_metadata$Country, 
                      group1 = group1,
                      group2 = group2)
  print(result_list)
}
}

PCoA

p <- pca_plot_custom(filt_ileum_asv_tab,
                           filt_ileum_taxa_tab,
                           filt_ileum_metadata,
                           show_boxplots = TRUE,
                           variable = "Group", size=3, show_legend=FALSE)

# save the results
supplements_beta[[paste("PCoA aitchison",level,segment)]] <- p

# see the results
p

Bray-Curtis

PERMANOVA

# preparing data frame
pairwise_df <- filt_ileum_asv_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, sim.method = "bray", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, interaction = TRUE, sim.method = "bray", p.adjust.m="BH")

pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("bray",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 0.505 1.557 0.011 0.026 0.026 *
rPSC vs healthy 1 1.342 4.364 0.039 0.001 0.002 **
non-rPSC vs healthy 1 1.979 6.424 0.034 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC , Country 1 1.444 4.451 0.032 0.001 0.001 ***
rPSC vs healthy , Country 1 1.059 3.444 0.031 0.001 0.001 ***
non-rPSC vs healthy , Country 1 1.394 4.525 0.024 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC : Country 1 0.333 1.027 0.007 0.386 0.386
rPSC vs healthy : Country 1 0.390 1.272 0.011 0.093 0.140
non-rPSC vs healthy : Country 1 0.447 1.456 0.008 0.036 0.108

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

if (length(interaction_sig)>0){
for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_ileum_metadata$Group,
                      covariate = filt_ileum_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      sim.method = 'bray')
  print(result_list)
}
}

PCoA

p <- pca_plot_custom(filt_ileum_asv_tab,
                     filt_ileum_taxa_tab,
                     filt_ileum_metadata,
                     measure = "bray",
                     show_boxplots = TRUE,
                     variable = "Group", size=3, show_legend=FALSE)

# save the results
supplements_beta[[paste("PCoA bray",level,segment)]] <- p

# see the results
p

Jaccard

PERMANOVA

# preparing data frame
pairwise_df <- filt_ileum_asv_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, sim.method = "jaccard", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, interaction = TRUE, sim.method = "jaccard", p.adjust.m="BH")

pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("jaccard",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 0.518 1.318 0.009 0.029 0.029 *
rPSC vs healthy 1 1.117 2.934 0.027 0.001 0.002 **
non-rPSC vs healthy 1 1.607 4.216 0.023 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC , Country 1 1.214 3.089 0.022 0.001 0.001 ***
rPSC vs healthy , Country 1 0.939 2.467 0.023 0.001 0.001 ***
non-rPSC vs healthy , Country 1 1.203 3.157 0.017 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC : Country 1 0.390 0.992 0.007 0.469 0.469
rPSC vs healthy : Country 1 0.430 1.130 0.010 0.131 0.196
non-rPSC vs healthy : Country 1 0.501 1.318 0.007 0.026 0.078

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

if (length(interaction_sig)>0){
for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_ileum_metadata$Group,
                      covariate = filt_ileum_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      sim.method = 'jaccard')
  print(result_list)
}
}

PCoA

p <- pca_plot_custom(filt_ileum_asv_tab,
                     filt_ileum_taxa_tab,
                     filt_ileum_metadata,
                     measure = "jaccard",
                     show_boxplots = TRUE,
                     variable = "Group", size=3, show_legend=FALSE)

# save the results
supplements_beta[[paste("PCoA jaccard",level,segment)]] <- p

# see the results
p

Saving results

write.xlsx(supplements_beta[!grepl("PCoA",names(supplements_beta))],
           file = file.path(path,
           paste0("supplements_beta_diversity_", segment,".xlsx")))

Univariate Analysis

Main - Genus level

level="genus"
# needed paths
path = "../results/Q2/univariate_analysis"
path_maaslin=file.path("../intermediate_files/maaslin/Q2",level)
# variables
raw_linda_results_genus <- list();
raw_linda_results_genus[[segment]] <- list()
linda_results_genus <- list(); 
linda_results_genus[[segment]] <- list()

# country and interaction problems
list_country_union <- list()
list_intersections <- list()
list_venns <- list()
uni_statistics <- list()

# workbook for final df
wb <- createWorkbook()

# rPSC effect
rpsc_effect <- list()

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

ileum_genus_asv_taxa_tab <- create_asv_taxa_table(ileum_genus_tab,
                                                  ileum_genus_taxa_tab)

rPSC vs non-rPSC

group <- c("non-rPSC","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
linDA

# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
                            ileum_genus_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")
Removing 55 ASV(s)
Removing 2 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  138  samples and  197  features will be tested!
Warning: Some features have less than 3 nonzero values! 
                        They have virtually no statistical power. You may consider filtering them in the analysis!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("Country effect") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
Warning: Removed 1 row containing missing values or values outside the scale
range (`geom_text_repel()`).Warning: Removed 7 rows containing missing values or values outside the scale
range (`geom_text_repel()`).
volcano

Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Country - Union
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "non-rPSC"
[1] "rPSC"
[1] "non-rPSC"
[1] "rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

rPSC vs healthy

group <- c("healthy","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
linDA

# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
                            ileum_genus_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")
Removing 67 ASV(s)
Removing 4 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  106  samples and  207  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}

# summary statistics
# raw_linda_results_genus <- binomial_statistics(filt_ileum_uni_data,             
#                                             group=group,
#                                             filt_ileum_uni_metadata,
#                                             raw_linda_results_genus,
#                                             segment = "terminal_ileum")
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("Country effect") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
Warning: Removed 23 rows containing missing values or values outside the scale
range (`geom_text_repel()`).Warning: Removed 9 rows containing missing values or values outside the scale
range (`geom_text_repel()`).
volcano

Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Country - Union
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "rPSC"
[1] "healthy"
[1] "rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

non-rPSC vs healthy

group <- c("healthy","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
linDA

# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
                            ileum_genus_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")
Removing 24 ASV(s)
Removing 2 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  178  samples and  179  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}

# summary statistics
# raw_linda_results_genus <- binomial_statistics(filt_ileum_uni_data,             
#                                             group=group,
#                                             filt_ileum_uni_metadata,
#                                             raw_linda_results_genus,
#                                             segment = "terminal_ileum")
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("Country effect") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
Warning: Removed 32 rows containing missing values or values outside the scale
range (`geom_text_repel()`).Warning: Removed 14 rows containing missing values or values outside the scale
range (`geom_text_repel()`).
volcano

Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Country - Union
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "non-rPSC"
[1] "healthy"
[1] "non-rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

Visualization

Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.

list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,ileum_taxa_tab)
p_heatmap_linda

Dot heatmap

dotheatmap_linda <- dot_heatmap_linda(
  list_heatmap,                               uni_statistics$terminal_ileum[grepl(level,names(uni_statistics$terminal_ileum))],
                                      ileum_taxa_tab) + xlab("") + ylab("")
min_clr -1.727234 
max_clr 7.029952 
min_log -4.215042 
max_log 3.331439 
dotheatmap_linda

Horizontal bar plot

p_prevalence <- horizontal_barplot(wb,taxa=levels(dotheatmap_linda$data$SeqID))
Using SeqID as id variables
p_prevalence_final <- ggarrange(p_prevalence,
                                ggplot() + theme_minimal(),
                                nrow = 2,heights = c(1,0.085))
p <- ggarrange(dotheatmap_linda + theme(legend.position = "none"),p_prevalence_final,ncol=2,widths = c(1,0.3))
Warning: Removed 64 rows containing missing values or values outside the scale
range (`geom_point()`).Warning: Removed 64 rows containing missing values or values outside the scale
range (`geom_text()`).
p


dot_heatmap_ileum <- p
pdf("../figures/Q2/dotplot_terminal_ileum.pdf",
    height =10,width = 4)
p
dev.off()

rPSC effect

pre_LTx vs Healthy and Post_LTx vs Healthy intersection

A <- list_intersections[[paste(segment,level,"healthy vs rPSC")]]
B <- list_intersections[[paste(segment,level,"healthy vs non-rPSC")]]
df <- A[!(A$SeqID %in% B$SeqID),]


rpsc_effect[[paste(segment,level)]] <- df
  
# see the results
rpsc_effect[[paste(segment,level)]] 

Saving results

# ALL DATA
saveWorkbook(wb,file.path(path,paste0("uni_analysis_wb_",segment,".xlsx")),
             overwrite = TRUE)

# PSC effect
write.xlsx(rpsc_effect[[paste(segment,level)]],file.path(path,paste0("rpsc_effect_",segment,".xlsx")))

# SIGNIFICANT taxa

write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
            `names<-`(gsub(segment, "", names(
              list_intersections[grepl(segment,names(list_intersections))]))),
           file.path(path,paste0("significant_taxa_",segment,".xlsx")))

Supplementary Analysis

supplements_uni <- list()
supplements_wb <- createWorkbook()

ASV level

level="ASV"
path_maaslin="../intermediate_files/maaslin/Q2/ASV/"
raw_linda_results <- list();
raw_linda_results[[segment]] <- list()
linda_results <- list(); 
linda_results[[segment]] <- list()
rPSC vs non-rPSC
group <- c("non-rPSC","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA


# prepare the data
linda_data <- binomial_prep(ileum_asv_tab,
                            ileum_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")
Removing 1127 ASV(s)
Removing 27 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  138  samples and  466  features will be tested!
Warning: Some features have less than 3 nonzero values! 
                        They have virtually no statistical power. You may consider filtering them in the analysis!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("Country effect") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Country - Union

list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "non-rPSC"
[1] "rPSC"
[1] "non-rPSC"
[1] "rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
rPSC vs healthy
group <- c("healthy","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA


# prepare the data
linda_data <- binomial_prep(ileum_asv_tab,
                            ileum_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")
Removing 1214 ASV(s)
Removing 122 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  106  samples and  554  features will be tested!
Warning: Some features have less than 3 nonzero values! 
                        They have virtually no statistical power. You may consider filtering them in the analysis!
Pseudo-count approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}

# summary statistics
# raw_linda_results_genus <- binomial_statistics(filt_ileum_uni_data,             
#                                             group=group,
#                                             filt_ileum_uni_metadata,
#                                             raw_linda_results_genus,
#                                             segment = "terminal_ileum")
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("Country effect") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Country - Union

list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "rPSC"
[1] "healthy"
[1] "rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
non-rPSC vs healthy
group <- c("healthy","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA


# prepare the data
linda_data <- binomial_prep(ileum_asv_tab,
                            ileum_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")
Removing 392 ASV(s)
Removing 91 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  178  samples and  460  features will be tested!
Pseudo-count approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}

# summary statistics
# raw_linda_results_genus <- binomial_statistics(filt_ileum_uni_data,             
#                                             group=group,
#                                             filt_ileum_uni_metadata,
#                                             raw_linda_results_genus,
#                                             segment = "terminal_ileum")
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("Country effect") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Country - Union

list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "non-rPSC"
[1] "healthy"
[1] "non-rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
Visualization

Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.

list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,ileum_taxa_tab)
p_heatmap_linda

Dot heatmap

dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
                                      uni_statistics$terminal_ileum[grepl(level,names(uni_statistics$terminal_ileum))],
                                      ileum_taxa_tab)
min_clr -0.9733804 
max_clr 4.946497 
min_log -4.9062 
max_log 5.67034 
dotheatmap_linda

rPSC effect

pre_LTx vs Healthy and Post_LTx vs Healthy intersection

A <- list_intersections[[paste(segment,level,"healthy vs rPSC")]]
B <- list_intersections[[paste(segment,level,"healthy vs non-rPSC")]]
df <- A[!(A$SeqID %in% B$SeqID),]


rpsc_effect[[paste(segment,level)]] <- df
  
# see the results
rpsc_effect[[paste(segment,level)]] 

Phylum level

level="phylum"
path_maaslin="../intermediate_files/maaslin/Q2/Phylum/"
raw_linda_results_phylum <- list();
raw_linda_results_phylum[[segment]] <- list()
linda_results_phylum <- list(); 
linda_results_phylum[[segment]] <- list()

Aggregate taxa

phylum_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = "Phylum")

ileum_phylum_tab <- phylum_data[[1]]
ileum_phylum_taxa_tab <- phylum_data[[2]]
rPSC vs non-rPSC
group <- c("non-rPSC","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA


# prepare the data
linda_data <- binomial_prep(ileum_phylum_tab,
                            ileum_phylum_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")
Removing 3 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  138  samples and  11  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_phylum[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_phylum[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)
Using Phylum for naming
volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("Country effect") 
Using Phylum for naming
volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction")
Using Phylum for naming
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_phylum, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Country - Union

list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_phylum[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "non-rPSC"
[1] "rPSC"
[1] "non-rPSC"
[1] "rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
rPSC vs healthy
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA


# prepare the data
linda_data <- binomial_prep(ileum_phylum_tab,
                            ileum_phylum_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")
Removing 2 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  106  samples and  11  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_phylum[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_phylum[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}

# summary statistics
# raw_linda_results_genus <- binomial_statistics(filt_ileum_uni_data,             
#                                             group=group,
#                                             filt_ileum_uni_metadata,
#                                             raw_linda_results_genus,
#                                             segment = "terminal_ileum")
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)
Using Phylum for naming
volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("Country effect") 
Using Phylum for naming
volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction")
Using Phylum for naming
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano

MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_phylum, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Country - Union

list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_phylum[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "rPSC"
[1] "healthy"
[1] "rPSC"
[1] "healthy"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
non-rPSC vs healthy
group <- c("healthy","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA


# prepare the data
linda_data <- binomial_prep(ileum_phylum_tab,
                            ileum_phylum_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")

filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  178  samples and  10  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_phylum[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_phylum[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}

# summary statistics
# raw_linda_results_genus <- binomial_statistics(filt_ileum_uni_data,             
#                                             group=group,
#                                             filt_ileum_uni_metadata,
#                                             raw_linda_results_genus,
#                                             segment = "terminal_ileum")
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)
Using Phylum for naming
volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("Country effect") 
Using Phylum for naming
volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction")
Using Phylum for naming
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_phylum, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Country - Union

list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_phylum[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "non-rPSC"
[1] "healthy"
[1] "non-rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
Visualization

Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.

list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,ileum_taxa_tab)
p_heatmap_linda

Dot heatmap

dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
                                      uni_statistics$terminal_ileum[grepl(level,names(uni_statistics$terminal_ileum))],
                                      ileum_taxa_tab)
min_clr -4.182065 
max_clr 4.949748 
min_log -2.332431 
max_log 2.832391 
dotheatmap_linda

rPSC effect

pre_LTx vs Healthy and Post_LTx vs Healthy intersection

A <- list_intersections[[paste(segment,level,"healthy vs rPSC")]]
B <- list_intersections[[paste(segment,level,"healthy vs non-rPSC")]]
df <- A[!(A$SeqID %in% B$SeqID),]

rpsc_effect[[paste(segment,level)]] <- df
  
# see the results
rpsc_effect[[paste(segment,level)]] 
NULL

Saving results

# ALL DATA
saveWorkbook(supplements_wb,file.path(path,paste0("supplements_uni_analysis_wb_",segment,".xlsx")),overwrite = TRUE)

# PSC effect
write.xlsx(rpsc_effect,
          file.path(path,paste0("supplements_rpsc_effect_",segment,".xlsx")))

# SIGNIFICANT taxa
write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
            `names<-`(gsub(segment, "", names(
              list_intersections[grepl(segment,names(list_intersections))]))),
           file.path(path,paste0("supplements_significant_taxa_",segment,".xlsx")))

Machine learning

path = "../results/Q2/models"

ElasticNet

model="enet"

ASV level

level="ASV"
rPSC vs non-rPSC
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")
Removing 1127 ASV(s)
Removing 27 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
models_summ <- list()
models_cm <- list()
betas <- list()
roc_cs <- list()

models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.4000000
lambda                          0.3123848
auc                             0.5000000
auc_czech                       0.5000000
auc_no                          0.5000000
auc_optimism_corrected          0.5929044
auc_optimism_corrected_CIL      0.4774822
auc_optimism_corrected_CIU      0.6968581
accuracy                        0.7608696
accuracy_czech                        NaN
accuracy_no                     0.7500000
accuracy_optimism_corrected     0.7325629
accuracy_optimism_corrected_CIL 0.6374152
accuracy_optimism_corrected_CIU 0.8114948
enet_model$conf_matrices
$original
    0  
0 105 0
1  33 0

$czech
   0  
0 78 0
1 24 0

$no
   0  
0 27 0
1  9 0
enet_model$plot

roc_c

rPSC vs healthy
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 1214 ASV(s)
Removing 122 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.2000000
lambda                          0.1112858
auc                             1.0000000
auc_czech                       1.0000000
auc_no                          1.0000000
auc_optimism_corrected          0.9566799
auc_optimism_corrected_CIL      0.8962456
auc_optimism_corrected_CIU      0.9859752
accuracy                        1.0000000
accuracy_czech                        NaN
accuracy_no                     1.0000000
accuracy_optimism_corrected     0.8766248
accuracy_optimism_corrected_CIL 0.7993421
accuracy_optimism_corrected_CIU 0.9369643
enet_model$conf_matrices
$original
    Predicted
True  0  1
   0 73  0
   1  0 33

$czech
    Predicted
True  0  1
   0 37  0
   1  0 24

$no
    Predicted
True  0  1
   0 36  0
   1  0  9
enet_model$plot


roc_c

non-rPSC vs healthy
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 392 ASV(s)
Removing 91 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.2000000
lambda                          0.0310618
auc                             1.0000000
auc_czech                       1.0000000
auc_no                          1.0000000
auc_optimism_corrected          0.9355688
auc_optimism_corrected_CIL      0.8641787
auc_optimism_corrected_CIU      0.9782348
accuracy                        1.0000000
accuracy_czech                        NaN
accuracy_no                     1.0000000
accuracy_optimism_corrected     0.8680780
accuracy_optimism_corrected_CIL 0.7842294
accuracy_optimism_corrected_CIU 0.9209600
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0  73   0
   1   0 105

$czech
    Predicted
True  0  1
   0 37  0
   1  0 78

$no
    Predicted
True  0  1
   0 36  0
   1  0 27
enet_model$plot


roc_c

rPSC effect
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_asv_tab,
                                                ileum_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")
Removing 1127 ASV(s)
Removing 27 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.4000000
lambda                          0.2737528
auc                             0.6350649
auc_czech                       0.6089744
auc_no                          0.7037037
auc_optimism_corrected          0.6554041
auc_optimism_corrected_CIL      0.5197632
auc_optimism_corrected_CIU      0.7667220
accuracy                        0.7608696
accuracy_czech                        NaN
accuracy_no                     0.7500000
accuracy_optimism_corrected     0.7583172
accuracy_optimism_corrected_CIL 0.6754902
accuracy_optimism_corrected_CIU 0.8096875
enet_model$conf_matrices
$original
    0  
0 105 0
1  33 0

$czech
   0  
0 78 0
1 24 0

$no
   0  
0 27 0
1  9 0
enet_model$plot


roc_c

Genus level

level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]
rPSC vs non-rPSC
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.4000000
lambda                          0.3119645
auc                             0.6432900
auc_czech                       0.6567842
auc_no                          0.6069959
auc_optimism_corrected          0.5391113
auc_optimism_corrected_CIL      0.4255404
auc_optimism_corrected_CIU      0.6354965
accuracy                        0.7608696
accuracy_czech                        NaN
accuracy_no                     0.7500000
accuracy_optimism_corrected     0.6881402
accuracy_optimism_corrected_CIL 0.6073201
accuracy_optimism_corrected_CIU 0.7746941
enet_model$conf_matrices
$original
    0  
0 105 0
1  33 0

$czech
   0  
0 78 0
1 24 0

$no
   0  
0 27 0
1  9 0
enet_model$plot


roc_c

rPSC vs healthy
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 67 ASV(s)
Removing 4 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.2000000
lambda                          0.1466508
auc                             1.0000000
auc_czech                       1.0000000
auc_no                          1.0000000
auc_optimism_corrected          0.9240344
auc_optimism_corrected_CIL      0.8003744
auc_optimism_corrected_CIU      0.9821934
accuracy                        0.9811321
accuracy_czech                        NaN
accuracy_no                     0.9777778
accuracy_optimism_corrected     0.8625555
accuracy_optimism_corrected_CIL 0.7755383
accuracy_optimism_corrected_CIU 0.9328125
enet_model$conf_matrices
$original
    Predicted
True  0  1
   0 73  0
   1  2 31

$czech
    Predicted
True  0  1
   0 37  0
   1  1 23

$no
    Predicted
True  0  1
   0 36  0
   1  1  8
enet_model$plot


roc_c

non-rPSC vs healthy
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,group, 
                                     usage="ml_clr")
Removing 24 ASV(s)
Removing 2 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.2000000
lambda                          0.0419208
auc                             1.0000000
auc_czech                       1.0000000
auc_no                          1.0000000
auc_optimism_corrected          0.9461298
auc_optimism_corrected_CIL      0.8722130
auc_optimism_corrected_CIU      0.9819748
accuracy                        1.0000000
accuracy_czech                        NaN
accuracy_no                     1.0000000
accuracy_optimism_corrected     0.8701671
accuracy_optimism_corrected_CIL 0.7972015
accuracy_optimism_corrected_CIU 0.9440068
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0  73   0
   1   0 105

$czech
    Predicted
True  0  1
   0 37  0
   1  0 78

$no
    Predicted
True  0  1
   0 36  0
   1  0 27
enet_model$plot

roc_c

rPSC effect
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_genus_tab,
                                                ileum_genus_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")
Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.4000000
lambda                          0.3616408
auc                             0.5000000
auc_czech                       0.5000000
auc_no                          0.5000000
auc_optimism_corrected          0.6729259
auc_optimism_corrected_CIL      0.6036558
auc_optimism_corrected_CIU      0.7434551
accuracy                        0.7608696
accuracy_czech                        NaN
accuracy_no                     0.7500000
accuracy_optimism_corrected     0.7604718
accuracy_optimism_corrected_CIL 0.6916360
accuracy_optimism_corrected_CIU 0.8096875
enet_model$conf_matrices
$original
    0  
0 105 0
1  33 0

$czech
   0  
0 78 0
1 24 0

$no
   0  
0 27 0
1  9 0
enet_model$plot


roc_c

Saving results

models_summ_df_ileum <- do.call(rbind, 
  models_summ[grep(segment,names(models_summ),value = TRUE)])

write.csv(models_summ_df_ileum,file.path(path,paste0("elastic_net_",segment,".csv")))

Supplementary models

supplements_models <- list()

CLR-transformed data

kNN
model="knn"
ASV level
level="ASV"

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")
Removing 1127 ASV(s)
Removing 27 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               30.0000000
auc                              0.7075036
auc_optimism_corrected           0.5690333
auc_optimism_corrected_CIL       0.3766176
auc_optimism_corrected_CIU       0.6993946
accuracy                         0.7608696
accuracy_optimism_corrected      0.7565703
accuracy_optimism_corrected_CIL  0.6916360
accuracy_optimism_corrected_CIU  0.7916667
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)


# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")
Removing 1214 ASV(s)
Removing 122 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs


# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               11.0000000
auc                              0.9618099
auc_optimism_corrected           0.8903596
auc_optimism_corrected_CIL       0.8195214
auc_optimism_corrected_CIU       0.9472403
accuracy                         0.8301887
accuracy_optimism_corrected      0.7865538
accuracy_optimism_corrected_CIL  0.6431652
accuracy_optimism_corrected_CIU  0.8817460
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")
Removing 392 ASV(s)
Removing 91 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               30.0000000
auc                              0.9123940
auc_optimism_corrected           0.8432204
auc_optimism_corrected_CIL       0.7626955
auc_optimism_corrected_CIU       0.9176372
accuracy                         0.8426966
accuracy_optimism_corrected      0.7828159
accuracy_optimism_corrected_CIL  0.6389927
accuracy_optimism_corrected_CIU  0.8876080
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_asv_tab,
                                                ileum_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")
Removing 1127 ASV(s)
Removing 27 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               15.0000000
auc                              0.7424242
auc_optimism_corrected           0.6372846
auc_optimism_corrected_CIL       0.5144382
auc_optimism_corrected_CIU       0.7584529
accuracy                         0.7753623
accuracy_optimism_corrected      0.7609173
accuracy_optimism_corrected_CIL  0.6960478
accuracy_optimism_corrected_CIU  0.8125000
roc_c

Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs


# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               30.0000000
auc                              0.6545455
auc_optimism_corrected           0.4773881
auc_optimism_corrected_CIL       0.3668218
auc_optimism_corrected_CIU       0.5755595
accuracy                         0.7608696
accuracy_optimism_corrected      0.7565703
accuracy_optimism_corrected_CIL  0.6916360
accuracy_optimism_corrected_CIU  0.7981250
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 67 ASV(s)
Removing 4 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs


# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               11.0000000
auc                              0.9798672
auc_optimism_corrected           0.8794371
auc_optimism_corrected_CIL       0.7240942
auc_optimism_corrected_CIU       0.9734677
accuracy                         0.7830189
accuracy_optimism_corrected      0.7505292
accuracy_optimism_corrected_CIL  0.6652256
accuracy_optimism_corrected_CIU  0.8442857
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 24 ASV(s)
Removing 2 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs


# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               11.0000000
auc                              0.9592955
auc_optimism_corrected           0.9016547
auc_optimism_corrected_CIL       0.8550021
auc_optimism_corrected_CIU       0.9489797
accuracy                         0.8483146
accuracy_optimism_corrected      0.7421026
accuracy_optimism_corrected_CIL  0.6488462
accuracy_optimism_corrected_CIU  0.8474315
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_genus_tab,
                                                ileum_genus_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")
Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               30.0000000
auc                              0.6939394
auc_optimism_corrected           0.5806831
auc_optimism_corrected_CIL       0.4759774
auc_optimism_corrected_CIU       0.7342562
accuracy                         0.7608696
accuracy_optimism_corrected      0.7466992
accuracy_optimism_corrected_CIL  0.6561581
accuracy_optimism_corrected_CIU  0.8096875
roc_c

Random Forest
model="rf"
ASV level
level="ASV"

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")
Removing 1127 ASV(s)
Removing 27 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "121"      
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.5849764"
auc_optimism_corrected_CIL      "0.4977469"
auc_optimism_corrected_CIU      "0.6674945"
accuracy                        "1"        
accuracy_optimism_corrected     "0.758507" 
accuracy_optimism_corrected_CIL "0.6912377"
accuracy_optimism_corrected_CIU "0.8275641"
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")
Removing 1214 ASV(s)
Removing 122 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "325"      
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.9211468"
auc_optimism_corrected_CIL      "0.7937378"
auc_optimism_corrected_CIU      "0.972576" 
accuracy                        "1"        
accuracy_optimism_corrected     "0.8046381"
accuracy_optimism_corrected_CIL "0.6938995"
accuracy_optimism_corrected_CIU "0.8602183"
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")
Removing 392 ASV(s)
Removing 91 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "87"       
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.8758431"
auc_optimism_corrected_CIL      "0.8091548"
auc_optimism_corrected_CIU      "0.9587847"
accuracy                        "1"        
accuracy_optimism_corrected     "0.8426825"
accuracy_optimism_corrected_CIL "0.786062" 
accuracy_optimism_corrected_CIU "0.9341625"
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_asv_tab,
                                                ileum_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")
Removing 1127 ASV(s)
Removing 27 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "3"        
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "0.9998557"
auc_optimism_corrected          "0.5509912"
auc_optimism_corrected_CIL      "0.3424858"
auc_optimism_corrected_CIU      "0.6863205"
accuracy                        "0.9927536"
accuracy_optimism_corrected     "0.7465253"
accuracy_optimism_corrected_CIL "0.7064338"
accuracy_optimism_corrected_CIU "0.7869668"
roc_c

Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "137"      
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.6037616"
auc_optimism_corrected_CIL      "0.4547491"
auc_optimism_corrected_CIU      "0.6788593"
accuracy                        "1"        
accuracy_optimism_corrected     "0.7519543"
accuracy_optimism_corrected_CIL "0.6865502"
accuracy_optimism_corrected_CIU "0.8258333"
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 67 ASV(s)
Removing 4 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "79"       
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.9413444"
auc_optimism_corrected_CIL      "0.8586957"
auc_optimism_corrected_CIU      "0.981984" 
accuracy                        "1"        
accuracy_optimism_corrected     "0.8364897"
accuracy_optimism_corrected_CIL "0.6866541"
accuracy_optimism_corrected_CIU "0.9322768"
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 24 ASV(s)
Removing 2 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "99"       
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.9028857"
auc_optimism_corrected_CIL      "0.8299031"
auc_optimism_corrected_CIU      "0.9638187"
accuracy                        "1"        
accuracy_optimism_corrected     "0.8669888"
accuracy_optimism_corrected_CIL "0.7925"   
accuracy_optimism_corrected_CIU "0.9550601"
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_genus_tab,
                                                ileum_genus_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")
Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "7"        
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.5992598"
auc_optimism_corrected_CIL      "0.5281838"
auc_optimism_corrected_CIU      "0.6611226"
accuracy                        "1"        
accuracy_optimism_corrected     "0.7549795"
accuracy_optimism_corrected_CIL "0.6912377"
accuracy_optimism_corrected_CIU "0.8046875"
roc_c

Gradient boosting
model="gb"
ASV level
level="ASV"

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")
Removing 1127 ASV(s)
Removing 27 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.5174986
auc_optimism_corrected_CIL        0.4020749
auc_optimism_corrected_CIU        0.6121226
accuracy                          1.0000000
accuracy_optimism_corrected       0.7345733
accuracy_optimism_corrected_CIL   0.6916360
accuracy_optimism_corrected_CIU   0.7679720
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")
Removing 1214 ASV(s)
Removing 122 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         100.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.9177655
auc_optimism_corrected_CIL        0.8539862
auc_optimism_corrected_CIU        0.9568388
accuracy                          1.0000000
accuracy_optimism_corrected       0.8083337
accuracy_optimism_corrected_CIL   0.7415072
accuracy_optimism_corrected_CIU   0.8709821
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")
Removing 392 ASV(s)
Removing 91 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   30.0000000
auc                               1.0000000
auc_optimism_corrected            0.9183323
auc_optimism_corrected_CIL        0.8789748
auc_optimism_corrected_CIU        0.9764957
accuracy                          1.0000000
accuracy_optimism_corrected       0.8562009
accuracy_optimism_corrected_CIL   0.8014516
accuracy_optimism_corrected_CIU   0.9186104
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_asv_tab,
                                                ileum_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")
Removing 1127 ASV(s)
Removing 27 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         100.0000000
interaction.depth                 1.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               0.9161616
auc_optimism_corrected            0.6089129
auc_optimism_corrected_CIL        0.4836930
auc_optimism_corrected_CIU        0.6714232
accuracy                          0.8550725
accuracy_optimism_corrected       0.7623020
accuracy_optimism_corrected_CIL   0.7121936
accuracy_optimism_corrected_CIU   0.8241987
roc_c

Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         200.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   20.0000000
auc                               1.0000000
auc_optimism_corrected            0.5338402
auc_optimism_corrected_CIL        0.4365296
auc_optimism_corrected_CIU        0.6661672
accuracy                          1.0000000
accuracy_optimism_corrected       0.7037208
accuracy_optimism_corrected_CIL   0.6666667
accuracy_optimism_corrected_CIU   0.7421503
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 67 ASV(s)
Removing 4 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         100.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.9409113
auc_optimism_corrected_CIL        0.8776902
auc_optimism_corrected_CIU        0.9854808
accuracy                          1.0000000
accuracy_optimism_corrected       0.8497037
accuracy_optimism_corrected_CIL   0.7726974
accuracy_optimism_corrected_CIU   0.9564955
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 24 ASV(s)
Removing 2 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         200.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.9233013
auc_optimism_corrected_CIL        0.8738388
auc_optimism_corrected_CIU        0.9661456
accuracy                          1.0000000
accuracy_optimism_corrected       0.8521988
accuracy_optimism_corrected_CIL   0.7930597
accuracy_optimism_corrected_CIU   0.9070330
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_genus_tab,
                                                ileum_genus_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")
Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         200.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.6077735
auc_optimism_corrected_CIL        0.4459963
auc_optimism_corrected_CIU        0.7540255
accuracy                          1.0000000
accuracy_optimism_corrected       0.7337445
accuracy_optimism_corrected_CIL   0.6514706
accuracy_optimism_corrected_CIU   0.8208333
roc_c

Relative abundances

Elastic net
ASV level
level="ASV"

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 1127 ASV(s)
Removing 27 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.4000000
lambda                          0.3152294
auc                             0.5779221
auc_czech                       0.5774573
auc_no                          0.5781893
auc_optimism_corrected          0.6000705
auc_optimism_corrected_CIL      0.5004649
auc_optimism_corrected_CIU      0.7793269
accuracy                        0.7608696
accuracy_czech                        NaN
accuracy_no                     0.7500000
accuracy_optimism_corrected     0.7528585
accuracy_optimism_corrected_CIL 0.6105699
accuracy_optimism_corrected_CIU 0.8394891
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 1214 ASV(s)
Removing 122 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.4000000
lambda                          0.1081973
auc                             1.0000000
auc_czech                       1.0000000
auc_no                          1.0000000
auc_optimism_corrected          0.9529088
auc_optimism_corrected_CIL      0.8878711
auc_optimism_corrected_CIU      0.9894988
accuracy                        0.9905660
accuracy_czech                        NaN
accuracy_no                     0.9777778
accuracy_optimism_corrected     0.8597920
accuracy_optimism_corrected_CIL 0.7760628
accuracy_optimism_corrected_CIU 0.9045455
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)


# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 392 ASV(s)
Removing 91 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.60000000
lambda                          0.03782134
auc                             1.00000000
auc_czech                       1.00000000
auc_no                          1.00000000
auc_optimism_corrected          0.90635619
auc_optimism_corrected_CIL      0.82872212
auc_optimism_corrected_CIU      0.95835084
accuracy                        1.00000000
accuracy_czech                         NaN
accuracy_no                     1.00000000
accuracy_optimism_corrected     0.85064780
accuracy_optimism_corrected_CIL 0.77397643
accuracy_optimism_corrected_CIU 0.91861042
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_asv_tab,
                                                ileum_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")
Removing 1127 ASV(s)
Removing 27 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.4000000
lambda                          0.2328208
auc                             0.5000000
auc_czech                       0.5000000
auc_no                          0.5000000
auc_optimism_corrected          0.5690894
auc_optimism_corrected_CIL      0.4746083
auc_optimism_corrected_CIU      0.6652762
accuracy                        0.7608696
accuracy_czech                        NaN
accuracy_no                     0.7500000
accuracy_optimism_corrected     0.7504092
accuracy_optimism_corrected_CIL 0.6865502
accuracy_optimism_corrected_CIU 0.7981250
roc_c

Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.4000000
lambda                          0.3212085
auc                             0.5000000
auc_czech                       0.5000000
auc_no                          0.5000000
auc_optimism_corrected          0.5898005
auc_optimism_corrected_CIL      0.4531015
auc_optimism_corrected_CIU      0.7474247
accuracy                        0.7608696
accuracy_czech                        NaN
accuracy_no                     0.7500000
accuracy_optimism_corrected     0.7403522
accuracy_optimism_corrected_CIL 0.6561581
accuracy_optimism_corrected_CIU 0.8318910
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)


# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 67 ASV(s)
Removing 4 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           1.00000000
lambda                          0.01972607
auc                             1.00000000
auc_czech                       1.00000000
auc_no                          1.00000000
auc_optimism_corrected          0.85174703
auc_optimism_corrected_CIL      0.66521739
auc_optimism_corrected_CIU      0.94274500
accuracy                        1.00000000
accuracy_czech                         NaN
accuracy_no                     1.00000000
accuracy_optimism_corrected     0.79372737
accuracy_optimism_corrected_CIL 0.67350478
accuracy_optimism_corrected_CIU 0.87098214
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 24 ASV(s)
Removing 2 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.40000000
lambda                          0.05642506
auc                             0.99791259
auc_czech                       0.99861400
auc_no                          0.99794239
auc_optimism_corrected          0.90921375
auc_optimism_corrected_CIL      0.83611111
auc_optimism_corrected_CIU      0.95998026
accuracy                        0.96067416
accuracy_czech                         NaN
accuracy_no                     0.96825397
accuracy_optimism_corrected     0.82634576
accuracy_optimism_corrected_CIL 0.75730769
accuracy_optimism_corrected_CIU 0.87977038
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_genus_tab,
                                                ileum_genus_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")
Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.4000000
lambda                          0.2045333
auc                             0.5000000
auc_czech                       0.5000000
auc_no                          0.5000000
auc_optimism_corrected          0.5361341
auc_optimism_corrected_CIL      0.4551996
auc_optimism_corrected_CIU      0.6820732
accuracy                        0.7608696
accuracy_czech                        NaN
accuracy_no                     0.7500000
accuracy_optimism_corrected     0.7661738
accuracy_optimism_corrected_CIL 0.6960478
accuracy_optimism_corrected_CIU 0.8114183
roc_c

kNN
ASV level
level="ASV"

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 1127 ASV(s)
Removing 27 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               13.0000000
auc                              0.6914863
auc_optimism_corrected           0.4364683
auc_optimism_corrected_CIL       0.3442543
auc_optimism_corrected_CIU       0.6062799
accuracy                         0.7536232
accuracy_optimism_corrected      0.7603292
accuracy_optimism_corrected_CIL  0.6916360
accuracy_optimism_corrected_CIU  0.8096875
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 1214 ASV(s)
Removing 122 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               13.0000000
auc                              0.8756746
auc_optimism_corrected           0.7384519
auc_optimism_corrected_CIL       0.5847403
auc_optimism_corrected_CIU       0.8909709
accuracy                         0.6981132
accuracy_optimism_corrected      0.6905598
accuracy_optimism_corrected_CIL  0.5667398
accuracy_optimism_corrected_CIU  0.7935714
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 392 ASV(s)
Removing 91 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               23.0000000
auc                              0.8599478
auc_optimism_corrected           0.7960201
auc_optimism_corrected_CIL       0.7319426
auc_optimism_corrected_CIU       0.8941088
accuracy                         0.7977528
accuracy_optimism_corrected      0.7173750
accuracy_optimism_corrected_CIL  0.6103846
accuracy_optimism_corrected_CIU  0.8006849
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_asv_tab,
                                                ileum_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")
Removing 1127 ASV(s)
Removing 27 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               30.0000000
auc                              0.7507937
auc_optimism_corrected           0.6012685
auc_optimism_corrected_CIL       0.4582148
auc_optimism_corrected_CIU       0.7079090
accuracy                         0.7608696
accuracy_optimism_corrected      0.7445805
accuracy_optimism_corrected_CIL  0.6713542
accuracy_optimism_corrected_CIU  0.8070913
roc_c

Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               12.0000000
auc                              0.7204906
auc_optimism_corrected           0.5247021
auc_optimism_corrected_CIL       0.4504327
auc_optimism_corrected_CIU       0.6254728
accuracy                         0.7681159
accuracy_optimism_corrected      0.7526688
accuracy_optimism_corrected_CIL  0.6754902
accuracy_optimism_corrected_CIU  0.7916667
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 67 ASV(s)
Removing 4 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               11.0000000
auc                              0.8897883
auc_optimism_corrected           0.7275478
auc_optimism_corrected_CIL       0.5635054
auc_optimism_corrected_CIU       0.8718839
accuracy                         0.7641509
accuracy_optimism_corrected      0.7317397
accuracy_optimism_corrected_CIL  0.6065789
accuracy_optimism_corrected_CIU  0.8249554
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 24 ASV(s)
Removing 2 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               25.0000000
auc                              0.8847358
auc_optimism_corrected           0.8273708
auc_optimism_corrected_CIL       0.7218575
auc_optimism_corrected_CIU       0.8887756
accuracy                         0.8202247
accuracy_optimism_corrected      0.7458772
accuracy_optimism_corrected_CIL  0.6807692
accuracy_optimism_corrected_CIU  0.8203125
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_genus_tab,
                                                ileum_genus_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")
Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               13.0000000
auc                              0.7734488
auc_optimism_corrected           0.5463875
auc_optimism_corrected_CIL       0.3858295
auc_optimism_corrected_CIU       0.6525333
accuracy                         0.7753623
accuracy_optimism_corrected      0.7408946
accuracy_optimism_corrected_CIL  0.6865502
accuracy_optimism_corrected_CIU  0.7885417
roc_c

Random Forest
ASV level
level="ASV"

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 1127 ASV(s)
Removing 27 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "45"       
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.6294299"
auc_optimism_corrected_CIL      "0.5442163"
auc_optimism_corrected_CIU      "0.7521896"
accuracy                        "1"        
accuracy_optimism_corrected     "0.7583885"
accuracy_optimism_corrected_CIL "0.691636" 
accuracy_optimism_corrected_CIU "0.798125" 
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 1214 ASV(s)
Removing 122 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "283"      
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.9400127"
auc_optimism_corrected_CIL      "0.8682876"
auc_optimism_corrected_CIU      "0.9915483"
accuracy                        "1"        
accuracy_optimism_corrected     "0.8348226"
accuracy_optimism_corrected_CIL "0.7580357"
accuracy_optimism_corrected_CIU "0.8881746"
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 392 ASV(s)
Removing 91 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "65"       
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.909026" 
auc_optimism_corrected_CIL      "0.8608662"
auc_optimism_corrected_CIU      "0.9626096"
accuracy                        "1"        
accuracy_optimism_corrected     "0.8424148"
accuracy_optimism_corrected_CIL "0.7595179"
accuracy_optimism_corrected_CIU "0.9007692"
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_asv_tab,
                                                ileum_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")
Removing 1127 ASV(s)
Removing 27 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "1"        
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "0.9943723"
auc_optimism_corrected          "0.63524"  
auc_optimism_corrected_CIL      "0.5536503"
auc_optimism_corrected_CIU      "0.7122773"
accuracy                        "0.8623188"
accuracy_optimism_corrected     "0.7325812"
accuracy_optimism_corrected_CIL "0.6865502"
accuracy_optimism_corrected_CIU "0.7682127"
roc_c

Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "1"        
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.6340995"
auc_optimism_corrected_CIL      "0.560939" 
auc_optimism_corrected_CIU      "0.7418293"
accuracy                        "0.9057971"
accuracy_optimism_corrected     "0.7350835"
accuracy_optimism_corrected_CIL "0.6862745"
accuracy_optimism_corrected_CIU "0.7708333"
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 67 ASV(s)
Removing 4 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "203"      
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.9497313"
auc_optimism_corrected_CIL      "0.8715106"
auc_optimism_corrected_CIU      "0.9941356"
accuracy                        "1"        
accuracy_optimism_corrected     "0.8596052"
accuracy_optimism_corrected_CIL "0.7941587"
accuracy_optimism_corrected_CIU "0.9322768"
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 24 ASV(s)
Removing 2 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "7"        
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.9295135"
auc_optimism_corrected_CIL      "0.8657029"
auc_optimism_corrected_CIU      "0.9839446"
accuracy                        "1"        
accuracy_optimism_corrected     "0.8739723"
accuracy_optimism_corrected_CIL "0.8014516"
accuracy_optimism_corrected_CIU "0.9344505"
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_genus_tab,
                                                ileum_genus_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")
Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "1"        
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.6155945"
auc_optimism_corrected_CIL      "0.5505441"
auc_optimism_corrected_CIU      "0.711145" 
accuracy                        "0.8985507"
accuracy_optimism_corrected     "0.7484636"
accuracy_optimism_corrected_CIL "0.6514706"
accuracy_optimism_corrected_CIU "0.8309943"
roc_c

Gradient boosting
ASV level
level="ASV"

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 1127 ASV(s)
Removing 27 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         100.0000000
interaction.depth                 1.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               0.9927850
auc_optimism_corrected            0.5068135
auc_optimism_corrected_CIL        0.4057158
auc_optimism_corrected_CIU        0.5988107
accuracy                          0.9275362
accuracy_optimism_corrected       0.7426751
accuracy_optimism_corrected_CIL   0.6710784
accuracy_optimism_corrected_CIU   0.8096875
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 1214 ASV(s)
Removing 122 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.9498809
auc_optimism_corrected_CIL        0.9147640
auc_optimism_corrected_CIU        0.9610390
accuracy                          1.0000000
accuracy_optimism_corrected       0.8697992
accuracy_optimism_corrected_CIL   0.7827778
accuracy_optimism_corrected_CIU   0.9342548
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")
Removing 392 ASV(s)
Removing 91 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         200.0000000
interaction.depth                 1.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.9222032
auc_optimism_corrected_CIL        0.8621283
auc_optimism_corrected_CIU        0.9842776
accuracy                          1.0000000
accuracy_optimism_corrected       0.8424909
accuracy_optimism_corrected_CIL   0.7446429
accuracy_optimism_corrected_CIU   0.9167308
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_asv_tab,
                                                ileum_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")
Removing 1127 ASV(s)
Removing 27 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         100.0000000
interaction.depth                 1.0000000
shrinkage                         0.1000000
n.minobsinnode                   20.0000000
auc                               0.8629149
auc_optimism_corrected            0.6394533
auc_optimism_corrected_CIL        0.5887956
auc_optimism_corrected_CIU        0.7095918
accuracy                          0.7971014
accuracy_optimism_corrected       0.7336972
accuracy_optimism_corrected_CIL   0.6912377
accuracy_optimism_corrected_CIU   0.7694547
roc_c

Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         200.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   20.0000000
auc                               0.9985570
auc_optimism_corrected            0.4493813
auc_optimism_corrected_CIL        0.3608186
auc_optimism_corrected_CIU        0.5408939
accuracy                          0.9855072
accuracy_optimism_corrected       0.6803940
accuracy_optimism_corrected_CIL   0.6117034
accuracy_optimism_corrected_CIU   0.7307692
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 67 ASV(s)
Removing 4 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   20.0000000
auc                               1.0000000
auc_optimism_corrected            0.9270279
auc_optimism_corrected_CIL        0.8614764
auc_optimism_corrected_CIU        0.9870724
accuracy                          1.0000000
accuracy_optimism_corrected       0.8379402
accuracy_optimism_corrected_CIL   0.7424825
accuracy_optimism_corrected_CIU   0.9322768
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")
Removing 24 ASV(s)
Removing 2 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 1.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.9520042
auc_optimism_corrected_CIL        0.8997708
auc_optimism_corrected_CIU        0.9953416
accuracy                          1.0000000
accuracy_optimism_corrected       0.8951099
accuracy_optimism_corrected_CIL   0.8153846
accuracy_optimism_corrected_CIU   0.9646154
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_genus_tab,
                                                ileum_genus_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")
Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         200.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.6146849
auc_optimism_corrected_CIL        0.5088894
auc_optimism_corrected_CIU        0.7767575
accuracy                          1.0000000
accuracy_optimism_corrected       0.7447143
accuracy_optimism_corrected_CIL   0.6514706
accuracy_optimism_corrected_CIU   0.8088542
roc_c

Saving results

models_list <- list()

for (model_name in names(supplements_models$models_summ)){
  df <- do.call(rbind, supplements_models$models_summ[[model_name]])
  models_list[[model_name]] <- df
}

write.xlsx(models_list,
           file=file.path(path,paste0("supplements_models_",segment,".xlsx")),
           rowNames=TRUE)

Results overview

Alpha diversity

pc_observed[[segment]]
pc_shannon[[segment]]
pc_simpson[[segment]]
pc_pielou[[segment]]

Plots

alpha_div_plots[[paste(segment,"Country")]]

alpha_div_plots[[paste(segment,"Custom")]]

Beta diversity

Main results

pairwise_aitchison_raw[[paste("genus", segment)]]

PCA

pca_plots_list[[paste(segment,"genus custom")]]

Supplements

knitr::kable(supplements_beta[!grepl("PCoA",names(supplements_beta))],
             digits = 3,
             caption = "Supplementary PERMANOVA results")
Supplementary PERMANOVA results
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 0.251 1.208 0.008 0.241 0.241
rPSC vs healthy 1 0.813 4.432 0.039 0.001 0.002 **
non-rPSC vs healthy 1 1.241 6.538 0.034 0.001 0.002 **
rPSC vs non-rPSC , Country 1 1.449 6.976 0.049 0.001 0.001 ***
rPSC vs healthy , Country 1 0.972 5.300 0.047 0.001 0.001 ***
non-rPSC vs healthy , Country 1 1.336 7.038 0.037 0.001 0.001 ***
rPSC vs non-rPSC : Country 1 0.151 0.724 0.005 0.803 0.803
rPSC vs healthy : Country 1 0.287 1.572 0.014 0.067 0.100
non-rPSC vs healthy : Country 1 0.354 1.873 0.010 0.011 0.033 *
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 0.337 1.138 0.008 0.241 0.241
rPSC vs healthy 1 0.887 3.269 0.030 0.001 0.002 **
non-rPSC vs healthy 1 1.349 4.846 0.026 0.001 0.002 **
rPSC vs non-rPSC , Country 1 1.460 4.927 0.035 0.001 0.001 ***
rPSC vs healthy , Country 1 1.092 4.023 0.036 0.001 0.001 ***
non-rPSC vs healthy , Country 1 1.476 5.300 0.029 0.001 0.001 ***
rPSC vs non-rPSC : Country 1 0.242 0.817 0.006 0.823 0.823
rPSC vs healthy : Country 1 0.342 1.261 0.011 0.118 0.177
non-rPSC vs healthy : Country 1 0.445 1.604 0.009 0.014 0.042 *
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 316.296 1.176 0.008 0.071 0.071
rPSC vs healthy 1 830.867 2.897 0.027 0.001 0.002 **
non-rPSC vs healthy 1 1156.137 4.137 0.023 0.001 0.002 **
rPSC vs non-rPSC , Country 1 639.519 2.378 0.017 0.001 0.001 ***
rPSC vs healthy , Country 1 598.496 2.087 0.019 0.001 0.001 ***
non-rPSC vs healthy , Country 1 752.840 2.694 0.015 0.001 0.001 ***
rPSC vs non-rPSC : Country 1 222.179 0.825 0.006 0.960 0.960
rPSC vs healthy : Country 1 262.783 0.916 0.008 0.746 0.960
non-rPSC vs healthy : Country 1 291.928 1.045 0.006 0.333 0.960
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 0.505 1.557 0.011 0.026 0.026 *
rPSC vs healthy 1 1.342 4.364 0.039 0.001 0.002 **
non-rPSC vs healthy 1 1.979 6.424 0.034 0.001 0.002 **
rPSC vs non-rPSC , Country 1 1.444 4.451 0.032 0.001 0.001 ***
rPSC vs healthy , Country 1 1.059 3.444 0.031 0.001 0.001 ***
non-rPSC vs healthy , Country 1 1.394 4.525 0.024 0.001 0.001 ***
rPSC vs non-rPSC : Country 1 0.333 1.027 0.007 0.386 0.386
rPSC vs healthy : Country 1 0.390 1.272 0.011 0.093 0.140
non-rPSC vs healthy : Country 1 0.447 1.456 0.008 0.036 0.108
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 0.518 1.318 0.009 0.029 0.029 *
rPSC vs healthy 1 1.117 2.934 0.027 0.001 0.002 **
non-rPSC vs healthy 1 1.607 4.216 0.023 0.001 0.002 **
rPSC vs non-rPSC , Country 1 1.214 3.089 0.022 0.001 0.001 ***
rPSC vs healthy , Country 1 0.939 2.467 0.023 0.001 0.001 ***
non-rPSC vs healthy , Country 1 1.203 3.157 0.017 0.001 0.001 ***
rPSC vs non-rPSC : Country 1 0.390 0.992 0.007 0.469 0.469
rPSC vs healthy : Country 1 0.430 1.130 0.010 0.131 0.196
non-rPSC vs healthy : Country 1 0.501 1.318 0.007 0.026 0.078

PCA

ggarrange(plotlist = supplements_beta[grepl("PCoA",names(supplements_beta))],
          labels=names(supplements_beta[grepl("PCoA",names(supplements_beta))]),
          font.label = list(size=5,face="plain"),
          ncol=2,nrow=3)

Univariate analysis

Number of significant taxa

knitr::kable(cbind(as.data.frame(lapply(list_intersections,nrow)),
      as.data.frame(lapply(rpsc_effect,nrow))) %>% t() %>% 
  `colnames<-`("Count") %>% 
  `rownames<-`(c(names(list_intersections),"PSC effect ASV","PSC effect Genus")),caption="Number of significant taxa")
Number of significant taxa
Count
terminal_ileum genus non-rPSC vs rPSC 0
terminal_ileum genus healthy vs rPSC 34
terminal_ileum genus healthy vs non-rPSC 25
terminal_ileum ASV non-rPSC vs rPSC 0
terminal_ileum ASV healthy vs rPSC 46
terminal_ileum ASV healthy vs non-rPSC 48
terminal_ileum phylum non-rPSC vs rPSC 0
terminal_ileum phylum rPSC vs healthy 3
terminal_ileum phylum healthy vs non-rPSC 3
PSC effect ASV 16
PSC effect Genus 19

Counts

# univar_list <- univariate_statistics(list_intersections,
#                                      psc_effect,
#                                      ileum_genus_asv_taxa_tab)
# 
# univar_df <- univar_list[[1]]
# wb <- univar_list[[2]]
# 
# # save the results
# saveWorkbook(wb,"results/Q1/DAA_final_terminal_ileum.xlsx", overwrite = TRUE)
# 
# # see the results
# univar_df

Machine learning

Main models

Summary

knitr::kable(models_summ_df_ileum %>% dplyr::select(
"alpha","lambda",
"auc_optimism_corrected",
"auc_optimism_corrected_CIL",
"auc_optimism_corrected_CIU"),
             digits=2,caption="Elastic net results")
Elastic net results
alpha lambda auc_optimism_corrected auc_optimism_corrected_CIL auc_optimism_corrected_CIU
rPSC vs non-rPSC ASV terminal_ileum 0.4 0.31 0.59 0.48 0.70
rPSC vs healthy ASV terminal_ileum 0.2 0.11 0.96 0.90 0.99
non-rPSC vs healthy ASV terminal_ileum 0.2 0.03 0.94 0.86 0.98
rPSC effect ASV terminal_ileum 0.4 0.27 0.66 0.52 0.77
rPSC vs non-rPSC genus terminal_ileum 0.4 0.31 0.54 0.43 0.64
rPSC vs healthy genus terminal_ileum 0.2 0.15 0.92 0.80 0.98
non-rPSC vs healthy genus terminal_ileum 0.2 0.04 0.95 0.87 0.98
rPSC effect genus terminal_ileum 0.4 0.36 0.67 0.60 0.74

ROC - ASV level

roc_curve_all_custom(roc_cs[c(1:4)], 
                     Q="Q2",
                     model_name="enet_model")
[1] "rPSC vs non-rPSC ASV terminal_ileum"   
[2] "rPSC vs healthy ASV terminal_ileum"    
[3] "non-rPSC vs healthy ASV terminal_ileum"
[4] "rPSC effect ASV terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
Please use `linewidth` instead.Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values

ROC - Genus level

roc_curve_all_custom(roc_cs[c(5:8)],Q="Q2",
                     model_name="enet_model")
[1] "rPSC vs non-rPSC genus terminal_ileum"   
[2] "rPSC vs healthy genus terminal_ileum"    
[3] "non-rPSC vs healthy genus terminal_ileum"
[4] "rPSC effect genus terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values

Supplementary models

Summary

# Build final dataframe
models_list[["enet_model"]] <- models_summ_df_ileum
final_df <- tibble(row_names = rownames(models_list[[1]]))

# Loop through models and extract required values
for (model_name in names(models_list)) {
  model_df <- models_list[[model_name]]
  
  # Combine AUC_optimism_corrected with its CI values
  final_df[[model_name]] <- paste0(
    round(model_df$auc_optimism_corrected, 3), 
    " (", round(model_df$auc_optimism_corrected_CIL, 3), "; ", 
    round(model_df$auc_optimism_corrected_CIU, 3), ")"
  )
}

knitr::kable(final_df, caption="All models")
All models
row_names knn_model rf_model gbm_model enet_model_ra knn_model_ra rf_model_ra gbm_model_ra enet_model
rPSC vs non-rPSC ASV terminal_ileum 0.569 (0.377; 0.699) 0.585 (0.498; 0.667) 0.517 (0.402; 0.612) 0.6 (0.5; 0.779) 0.436 (0.344; 0.606) 0.629 (0.544; 0.752) 0.507 (0.406; 0.599) 0.593 (0.477; 0.697)
rPSC vs healthy ASV terminal_ileum 0.89 (0.82; 0.947) 0.921 (0.794; 0.973) 0.918 (0.854; 0.957) 0.953 (0.888; 0.989) 0.738 (0.585; 0.891) 0.94 (0.868; 0.992) 0.95 (0.915; 0.961) 0.957 (0.896; 0.986)
non-rPSC vs healthy ASV terminal_ileum 0.843 (0.763; 0.918) 0.876 (0.809; 0.959) 0.918 (0.879; 0.976) 0.906 (0.829; 0.958) 0.796 (0.732; 0.894) 0.909 (0.861; 0.963) 0.922 (0.862; 0.984) 0.936 (0.864; 0.978)
rPSC effect ASV terminal_ileum 0.637 (0.514; 0.758) 0.551 (0.342; 0.686) 0.609 (0.484; 0.671) 0.569 (0.475; 0.665) 0.601 (0.458; 0.708) 0.635 (0.554; 0.712) 0.639 (0.589; 0.71) 0.655 (0.52; 0.767)
rPSC vs non-rPSC genus terminal_ileum 0.477 (0.367; 0.576) 0.604 (0.455; 0.679) 0.534 (0.437; 0.666) 0.59 (0.453; 0.747) 0.525 (0.45; 0.625) 0.634 (0.561; 0.742) 0.449 (0.361; 0.541) 0.539 (0.426; 0.635)
rPSC vs healthy genus terminal_ileum 0.879 (0.724; 0.973) 0.941 (0.859; 0.982) 0.941 (0.878; 0.985) 0.852 (0.665; 0.943) 0.728 (0.564; 0.872) 0.95 (0.872; 0.994) 0.927 (0.861; 0.987) 0.924 (0.8; 0.982)
non-rPSC vs healthy genus terminal_ileum 0.902 (0.855; 0.949) 0.903 (0.83; 0.964) 0.923 (0.874; 0.966) 0.909 (0.836; 0.96) 0.827 (0.722; 0.889) 0.93 (0.866; 0.984) 0.952 (0.9; 0.995) 0.946 (0.872; 0.982)
rPSC effect genus terminal_ileum 0.581 (0.476; 0.734) 0.599 (0.528; 0.661) 0.608 (0.446; 0.754) 0.536 (0.455; 0.682) 0.546 (0.386; 0.653) 0.616 (0.551; 0.711) 0.615 (0.509; 0.777) 0.673 (0.604; 0.743)

ROC - ASV

rocs_list <- supplements_models$roc_cs
rocs_list[["enet_model"]] <- roc_cs

plot_list <- list()

for (model_name in names(rocs_list)) {
  plot_list[[model_name]] <- roc_curve_all_custom(rocs_list[[model_name]][c(1:4)],
                       Q="Q2",
                       model_name=model_name)
}
[1] "rPSC vs non-rPSC ASV terminal_ileum"   
[2] "rPSC vs healthy ASV terminal_ileum"    
[3] "non-rPSC vs healthy ASV terminal_ileum"
[4] "rPSC effect ASV terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC ASV terminal_ileum"   
[2] "rPSC vs healthy ASV terminal_ileum"    
[3] "non-rPSC vs healthy ASV terminal_ileum"
[4] "rPSC effect ASV terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC ASV terminal_ileum"   
[2] "rPSC vs healthy ASV terminal_ileum"    
[3] "non-rPSC vs healthy ASV terminal_ileum"
[4] "rPSC effect ASV terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC ASV terminal_ileum"   
[2] "rPSC vs healthy ASV terminal_ileum"    
[3] "non-rPSC vs healthy ASV terminal_ileum"
[4] "rPSC effect ASV terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC ASV terminal_ileum"   
[2] "rPSC vs healthy ASV terminal_ileum"    
[3] "non-rPSC vs healthy ASV terminal_ileum"
[4] "rPSC effect ASV terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC ASV terminal_ileum"   
[2] "rPSC vs healthy ASV terminal_ileum"    
[3] "non-rPSC vs healthy ASV terminal_ileum"
[4] "rPSC effect ASV terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC ASV terminal_ileum"   
[2] "rPSC vs healthy ASV terminal_ileum"    
[3] "non-rPSC vs healthy ASV terminal_ileum"
[4] "rPSC effect ASV terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC ASV terminal_ileum"   
[2] "rPSC vs healthy ASV terminal_ileum"    
[3] "non-rPSC vs healthy ASV terminal_ileum"
[4] "rPSC effect ASV terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
ggarrange(plotlist = plot_list,labels = names(rocs_list),font.label = list(face="plain",size=7))

ROC - genus

plot_list <- list()

for (model_name in names(rocs_list)) {
  plot_list[[model_name]] <- roc_curve_all_custom(rocs_list[[model_name]][c(5:8)],
                       Q="Q2",
                       model_name=model_name)
}
[1] "rPSC vs non-rPSC genus terminal_ileum"   
[2] "rPSC vs healthy genus terminal_ileum"    
[3] "non-rPSC vs healthy genus terminal_ileum"
[4] "rPSC effect genus terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC genus terminal_ileum"   
[2] "rPSC vs healthy genus terminal_ileum"    
[3] "non-rPSC vs healthy genus terminal_ileum"
[4] "rPSC effect genus terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC genus terminal_ileum"   
[2] "rPSC vs healthy genus terminal_ileum"    
[3] "non-rPSC vs healthy genus terminal_ileum"
[4] "rPSC effect genus terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC genus terminal_ileum"   
[2] "rPSC vs healthy genus terminal_ileum"    
[3] "non-rPSC vs healthy genus terminal_ileum"
[4] "rPSC effect genus terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC genus terminal_ileum"   
[2] "rPSC vs healthy genus terminal_ileum"    
[3] "non-rPSC vs healthy genus terminal_ileum"
[4] "rPSC effect genus terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC genus terminal_ileum"   
[2] "rPSC vs healthy genus terminal_ileum"    
[3] "non-rPSC vs healthy genus terminal_ileum"
[4] "rPSC effect genus terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC genus terminal_ileum"   
[2] "rPSC vs healthy genus terminal_ileum"    
[3] "non-rPSC vs healthy genus terminal_ileum"
[4] "rPSC effect genus terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC genus terminal_ileum"   
[2] "rPSC vs healthy genus terminal_ileum"    
[3] "non-rPSC vs healthy genus terminal_ileum"
[4] "rPSC effect genus terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
p <- ggarrange(plotlist = plot_list,labels = names(rocs_list),font.label = list(face="plain",size=7))
p

pdf("../figures/Q2/models_ileum.pdf",
    height =10,width = 10)
p
dev.off()

Analysis - Colon

segment="colon"

Filtering

Rules: - prevalence > 5% (per group) - nearZeroVar with default settings - sequencing depth > 5000 - taxonomic assignment at least order

Rarefaction Curve

path="../intermediate_files/rarecurves"
seq_depth_threshold <- 10000
ps <- construct_phyloseq(colon_asv_tab,colon_taxa_tab,colon_metadata)
rareres <- get_rarecurve(obj=ps, chunks=500)
save(rareres,file = file.path(path,"rarefaction_colon.Rdata"))
load(file.path(path,"rarefaction_colon.Rdata"))
seq_depth_threshold <- 10000
prare <- ggrarecurve(obj=rareres,
                      factorNames="Country",
                      indexNames=c("Observe")) + 
        theme_bw()+
        theme(axis.text=element_text(size=8), 
              panel.grid=element_blank(),
              strip.background = element_rect(colour=NA,fill="grey"),
              strip.text.x = element_text(face="bold")) + 
        geom_vline(xintercept = seq_depth_threshold, 
                   linetype="dashed", color = "red") + 
        xlim(0, 20000)
Warning: NaNs producedThe color has been set automatically, you can reset it manually by adding scale_color_manual(values=yourcolors)
prare

Library size

read_counts(colon_asv_tab, line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

Sequencing depth

data_filt <- seq_depth_filtering(colon_asv_tab,
                                 colon_taxa_tab,
                                 colon_metadata,
                                 seq_depth_threshold = 10000)
Removing 50 ASV(s)
filt_colon_asv_tab <- data_filt[[1]]; alpha_colon_asv_tab <- filt_colon_asv_tab
filt_colon_taxa_tab <- data_filt[[2]]; alpha_colon_taxa_tab <- filt_colon_taxa_tab
filt_colon_metadata <- data_filt[[3]]; alpha_colon_metadata <- filt_colon_metadata

seq_step <- dim(filt_colon_asv_tab)[1]

Library size

read_counts(filt_colon_asv_tab,line = c(10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

NearZeroVar

data_filt <- nearzerovar_filtering(filt_colon_asv_tab,
                                   filt_colon_taxa_tab,
                                   filt_colon_metadata)

filt_colon_asv_tab <- data_filt[[1]]
filt_colon_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_colon_asv_tab)[1]

Library size

read_counts(filt_colon_asv_tab,line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

Check zero depth

data_filt <- check_zero_depth(filt_colon_asv_tab, 
                              filt_colon_taxa_tab, 
                              filt_colon_metadata)

filt_colon_asv_tab <- data_filt[[1]]; 
filt_colon_taxa_tab <- data_filt[[2]]; 
filt_colon_metadata <- data_filt[[3]]; 

Library size

read_counts(filt_colon_asv_tab,line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

Final Counts

final_counts_filtering(colon_asv_tab,
                       filt_colon_asv_tab,
                       filt_colon_metadata,
                       seq_step, 0, nearzero_step)

Alpha diversity

path = "../results/Q2/alpha_diversity"

Calculation

# Construct MPSE object
alpha_colon_metadata$Sample <- alpha_colon_metadata$SampleID
colon_mpse <- as.MPSE(construct_phyloseq(alpha_colon_asv_tab,
                                         alpha_colon_taxa_tab,
                                         alpha_colon_metadata))

colon_mpse %<>% mp_rrarefy(raresize = 10000,seed = 123)

# Calculate alpha diversity - rarefied counts
colon_mpse %<>% mp_cal_alpha(.abundance=RareAbundance, force=TRUE)
alpha_data <- data.frame(SampleID=colon_mpse$Sample.x,
                         Observe=colon_mpse$Observe,
                         Shannon=colon_mpse$Shannon,
                         Simpson=colon_mpse$Simpson,
                         Pielou=colon_mpse$Pielou,
                         Group=colon_mpse$Group,
                         Country=colon_mpse$Country,
                         Patient=colon_mpse$Patient)

write.csv(alpha_data,file.path(path,paste0("alpha_indices_",segment,".csv")),
          row.names = FALSE)

Plots

Country plot

p_boxplot_alpha <- alpha_diversity_countries(alpha_data)
Using SampleID, Group, Country, Patient as id variables
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.Using SampleID, Group, Country, Patient as id variables
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
# save the results
alpha_div_plots[[paste(segment,"Country")]] <- p_boxplot_alpha

# see the results
p_boxplot_alpha

pdf("../figures/Q2/alpha_diversity_colon.pdf",
    height =4,width = 7)
p_boxplot_alpha
dev.off()

Custom plot

alpha_data <- alpha_data %>% 
  dplyr::select(-c("Simpson","Pielou")) %>%
  mutate(Richness=Observe)

p_B <- alpha_diversity_custom_2(alpha_data,
                                size = 1.5,
                                width = 0.3)
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
# save the results
alpha_div_plots[[paste(segment,"Custom")]] <- p_B

p_B

Linear Model

path = "../results/Q2/alpha_diversity"
alpha_data <- read.csv(file.path(path,paste0("alpha_indices_",segment,".csv")))

Richness

results_model <- pairwise.lmer(
  formula = "Observe ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_observe <- results_model[[1]]
  results_model_observe_detailed <- results_model[[2]]
} else {
  results_model_observe <- results_model
  results_model_observe_detailed <- NA
}

# save the results
pc_observed[[segment]] <- results_model_observe
# see the results
knitr::kable(results_model_observe,digits = 3,
caption = "Raw results of linear model of richness estimation.")
Raw results of linear model of richness estimation.
Estimate Std..Error df t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC -17.252 12.469 156.883 -1.384 0.168 0.192
non-rPSC vs rPSC - CZ vs NO -17.941 10.644 153.303 -1.686 0.094 0.192
non-rPSC vs GrouprPSC:CountryNO 3.483 21.371 152.108 0.163 0.871 0.871
healthy vs GrouprPSC -40.307 11.924 125.516 -3.380 0.001 0.009 **
healthy vs rPSC - CZ vs NO 14.783 10.544 126.613 1.402 0.163 0.192
healthy vs GrouprPSC:CountryNO -29.232 19.769 123.028 -1.479 0.142 0.192
healthy vs Groupnon-rPSC -22.976 8.905 211.354 -2.580 0.011 0.047 *
healthy vs non-rPSC - CZ vs NO 14.961 10.892 212.956 1.374 0.171 0.192
healthy vs Groupnon-rPSC:CountryNO -32.911 14.701 208.353 -2.239 0.026 0.079

knitr::kable(results_model_observe_detailed,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 171.507 6.650 77.696 25.792 0.000 NA
GrouprPSC -40.316 11.792 76.971 -3.419 0.001 0.002
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 186.301 8.264 48.971 22.543 0 NA
GrouprPSC -69.557 16.036 46.963 -4.338 0 0
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 171.468 6.275 89.628 27.325 0.000 NA
CountryNO 14.904 9.836 89.468 1.515 0.133 0.178
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 131.179 11.294 36.279 11.615 0.000 NA
CountryNO -14.435 19.236 35.739 -0.750 0.458 0.458
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 171.439 6.808 137.314 25.182 0.000 NA
Groupnon-rPSC -22.949 8.722 135.888 -2.631 0.009 0.019
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 186.365 8.682 75.472 21.467 0 NA
Groupnon-rPSC -55.850 12.120 73.619 -4.608 0 0
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 171.468 6.275 89.628 27.325 0.000 NA
CountryNO 14.904 9.836 89.468 1.515 0.133 0.133
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 148.477 5.932 120.279 25.028 0.000 NA
CountryNO -17.950 10.519 117.183 -1.706 0.091 0.121

Shannon

results_model <- pairwise.lmer(
  formula = "Shannon ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_shannon <- results_model[[1]]
  results_model_shannon_detailed <- results_model[[2]]
} else {
  results_model_shannon <- results_model
  results_model_shannon_detailed <- NA
}

# save the results
pc_shannon[[segment]] <- as.data.frame(results_model_shannon)
# see the results
knitr::kable(results_model_shannon,digits = 3,
caption = "Raw results of linear model of Shannon estimation.")
Raw results of linear model of Shannon estimation.
Estimate Std..Error df t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC -0.146 0.161 154.485 -0.907 0.366 0.549
non-rPSC vs rPSC - CZ vs NO -0.355 0.138 151.788 -2.581 0.011 0.051
non-rPSC vs GrouprPSC:CountryNO 0.008 0.277 150.856 0.030 0.976 0.976
healthy vs GrouprPSC -0.330 0.128 126.201 -2.572 0.011 0.051
healthy vs rPSC - CZ vs NO 0.017 0.113 127.217 0.146 0.884 0.976
healthy vs GrouprPSC:CountryNO -0.364 0.213 123.907 -1.709 0.090 0.162
healthy vs Groupnon-rPSC -0.182 0.101 207.051 -1.795 0.074 0.162
healthy vs non-rPSC - CZ vs NO 0.018 0.124 208.617 0.145 0.884 0.976
healthy vs Groupnon-rPSC:CountryNO -0.375 0.168 204.129 -2.240 0.026 0.078

knitr::kable(results_model_shannon_detailed,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 3.683 0.065 77.802 56.300 0.000 NA
GrouprPSC -0.330 0.116 76.679 -2.846 0.006 0.011
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 3.700 0.099 48.693 37.40 0.000 NA
GrouprPSC -0.693 0.193 47.827 -3.59 0.001 0.003
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 3.683 0.054 89.061 68.505 0.000 NA
CountryNO 0.019 0.084 88.828 0.231 0.818 0.818
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 3.354 0.153 36.206 21.939 0.000 NA
CountryNO -0.347 0.261 35.912 -1.332 0.191 0.255
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 3.684 0.081 133.503 45.582 0.000 NA
Groupnon-rPSC -0.184 0.104 132.547 -1.772 0.079 0.105
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 3.702 0.093 77.069 39.853 0 NA
Groupnon-rPSC -0.559 0.129 74.201 -4.324 0 0
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 3.683 0.054 89.061 68.505 0.000 NA
CountryNO 0.019 0.084 88.828 0.231 0.818 0.818
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 3.501 0.075 117.089 46.588 0.000 NA
CountryNO -0.357 0.133 114.586 -2.673 0.009 0.017

Simpson

results_model <- pairwise.lmer(
  formula = "Simpson ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_simpson <- results_model[[1]]
  results_model_simpson_detailed <- results_model[[2]]
} else {
  results_model_simpson <- results_model
  results_model_simpson_detailed <- NA
}

# save the results
pc_simpson[[segment]] <- as.data.frame(results_model_simpson)
# see the results
knitr::kable(results_model_simpson,digits = 3,
caption = "Raw results of linear model of Simpson estimation.")
Raw results of linear model of Simpson estimation.
Estimate Std..Error df t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC -0.005 0.029 153.608 -0.169 0.866 0.866
non-rPSC vs rPSC - CZ vs NO -0.032 0.024 151.251 -1.296 0.197 0.443
non-rPSC vs GrouprPSC:CountryNO -0.027 0.049 150.427 -0.554 0.581 0.806
healthy vs GrouprPSC -0.031 0.020 127.660 -1.523 0.130 0.391
healthy vs rPSC - CZ vs NO -0.007 0.018 128.411 -0.402 0.688 0.806
healthy vs GrouprPSC:CountryNO -0.052 0.034 125.982 -1.540 0.126 0.391
healthy vs Groupnon-rPSC -0.026 0.016 201.924 -1.600 0.111 0.391
healthy vs non-rPSC - CZ vs NO -0.007 0.020 203.469 -0.363 0.717 0.806
healthy vs Groupnon-rPSC:CountryNO -0.025 0.026 199.046 -0.952 0.342 0.616

knitr::kable(results_model_simpson_detailed,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 0.944 0.008 79.314 121.920 0.000 NA
GrouprPSC -0.031 0.014 77.978 -2.245 0.028 0.057
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 0.936 0.019 48.621 50.151 0.000 NA
GrouprPSC -0.082 0.037 48.125 -2.258 0.029 0.057
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 0.943 0.004 90.956 214.963 0.000 NA
CountryNO -0.007 0.007 90.564 -0.990 0.325 0.325
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 0.913 0.029 36.306 31.229 0.000 NA
CountryNO -0.059 0.050 36.048 -1.181 0.245 0.325

Pielou

results_model <- pairwise.lmer(
  formula = "Pielou ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_pielou <- results_model[[1]]
  results_model_pielou_detailed <- results_model[[2]]
} else {
  results_model_pielou <- results_model
  results_model_pielou_detailed <- NA
}

# save the results
pc_pielou[[segment]] <- as.data.frame(results_model_pielou)
# see the results
knitr::kable(results_model_pielou,digits = 3,
caption = "Raw results of linear model of Pielou estimation.")
Raw results of linear model of Pielou estimation.
Estimate Std..Error df t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC -0.007 0.026 153.863 -0.292 0.771 0.843
non-rPSC vs rPSC - CZ vs NO -0.052 0.022 150.735 -2.372 0.019 0.171
non-rPSC vs GrouprPSC:CountryNO -0.009 0.044 149.674 -0.198 0.843 0.843
healthy vs GrouprPSC -0.023 0.020 127.665 -1.183 0.239 0.538
healthy vs rPSC - CZ vs NO -0.008 0.017 129.013 -0.457 0.648 0.843
healthy vs GrouprPSC:CountryNO -0.052 0.033 124.565 -1.605 0.111 0.333
healthy vs Groupnon-rPSC -0.015 0.015 204.651 -1.006 0.316 0.568
healthy vs non-rPSC - CZ vs NO -0.008 0.019 206.473 -0.416 0.678 0.843
healthy vs Groupnon-rPSC:CountryNO -0.044 0.025 201.171 -1.745 0.083 0.333

knitr::kable(results_model_pielou_detailed,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 0.720 0.009 78.395 76.134 0.000 NA
GrouprPSC -0.023 0.017 77.018 -1.392 0.168 0.224
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 0.712 0.016 49.474 44.366 0.000 NA
GrouprPSC -0.076 0.031 47.962 -2.424 0.019 0.077
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 0.720 0.007 89.185 99.650 0.000 NA
CountryNO -0.008 0.011 88.856 -0.667 0.506 0.506
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 0.697 0.025 36.449 27.771 0.000 NA
CountryNO -0.060 0.043 35.950 -1.414 0.166 0.224
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 0.720 0.013 132.894 56.00 0.000 NA
Groupnon-rPSC -0.016 0.016 132.038 -0.97 0.334 0.445
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 0.713 0.013 79.537 55.174 0.000 NA
Groupnon-rPSC -0.060 0.018 74.995 -3.379 0.001 0.005
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 0.720 0.007 89.185 99.650 0.000 NA
CountryNO -0.008 0.011 88.856 -0.667 0.506 0.506
Estimate Std. Error df t value Pr(>|t|) p.adjusted
(Intercept) 0.705 0.012 115.555 60.129 0.000 NA
CountryNO -0.052 0.021 112.732 -2.499 0.014 0.028

Saving results

alpha_list <- list(
  Richness=pc_observed[[segment]] %>% rownames_to_column("Comparison"),
  Shannon=pc_shannon[[segment]] %>% rownames_to_column("Comparison"),
  Simpson=pc_simpson[[segment]] %>% rownames_to_column("Comparison"),
  Pielou=pc_pielou[[segment]] %>% rownames_to_column("Comparison"))
                   
write.xlsx(alpha_list, 
           file = file.path(path,paste0("alpha_diversity_results_",segment,".xlsx")))

Beta diversity

Calculating Aitchison distance (euclidean distance on clr-transformed data), both at ASV and genus level.

Main analysis - Genus, Aitchison

Genus level, Aitchison distance

level="genus"
path = "../results/Q2/beta_diversity"

Aggregation, filtering

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level=level,
                             names=TRUE)

filt_data <- filtering_steps(genus_data[[1]],
                             genus_data[[2]],
                             colon_metadata,
                            seq_depth_threshold=10000)
Removing 5 ASV(s)
filt_colon_genus_tab <- filt_data[[1]]
filt_colon_genus_taxa <- filt_data[[2]]
filt_colon_genus_metadata <- filt_data[[3]]
PERMANOVA
pairwise_df <- filt_colon_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_colon_genus_metadata$Group,
                           covariate = filt_colon_genus_metadata$Country, 
                           patients = filt_colon_genus_metadata$Patient,
                           sim.method = "robust.aitchison", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_colon_genus_metadata$Group,
                          covariate = filt_colon_genus_metadata$Country, 
                          interaction = TRUE, 
                          patients = filt_colon_genus_metadata$Patient,
                          sim.method = "robust.aitchison", p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
pairwise_aitchison_raw[[paste(level, segment)]] <-rbind(pp_factor,pp_cov,pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 419.973 2.475 0.007 0.415 0.415
rPSC vs healthy 1 1473.966 8.988 0.035 0.001 0.002 **
non-rPSC vs healthy 1 1742.331 10.401 0.024 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC , Country 1 1556.933 9.177 0.026 0.001 0.001 ***
rPSC vs healthy , Country 1 883.284 5.386 0.021 0.001 0.001 ***
non-rPSC vs healthy , Country 1 1558.672 9.304 0.021 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC : Country 1 318.175 1.880 0.005 0.970 0.970
rPSC vs healthy : Country 1 319.594 1.956 0.008 0.820 0.970
non-rPSC vs healthy : Country 1 407.191 2.439 0.006 0.163 0.489

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

if (length(interaction_sig)>0){
 for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_colon_genus_metadata$Group,
                      covariate = filt_colon_genus_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      patients = filt_colon_genus_metadata$Patient)
  print(result_list)
} 
}
Plots

PCoA custom

p <- pca_plot_custom(filt_colon_genus_tab,
                                 filt_colon_genus_taxa,
                                 filt_colon_genus_metadata,
                                 show_boxplots = TRUE,
                                 variable = "Group", size=2, 
                                 show_legend=FALSE)

# save the results
pca_plots_list[[paste(segment,level,"custom")]] <- p

# see the results
p

pdf("../figures/Q2/beta_diversity_colon.pdf",
    height =5,width = 5)
p
dev.off()

Saving results

write.xlsx(pairwise_aitchison_raw[[paste(level, segment)]], 
           file = file.path(path,
           paste0("beta_diversity_results_", segment,".xlsx")))

Supplementary analysis

Genus level

level="genus"
Bray-Curtis

PERMANOVA

pairwise_df <- filt_colon_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_colon_genus_metadata$Group,
                           covariate = filt_colon_genus_metadata$Country, 
                           patients = filt_colon_genus_metadata$Patient,
                           sim.method = "bray", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,
                          filt_colon_genus_metadata$Group,
                          covariate = filt_colon_genus_metadata$Country, 
                          patients = filt_colon_genus_metadata$Patient,
                          interaction = TRUE, sim.method = "bray", p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("bray",level,segment)]] <- rbind(pp_factor,pp_cov,pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 0.504 2.488 0.007 0.267 0.267
rPSC vs healthy 1 2.464 13.675 0.051 0.001 0.002 **
non-rPSC vs healthy 1 3.227 18.295 0.039 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC , Country 1 5.037 24.881 0.066 0.001 0.001 ***
rPSC vs healthy , Country 1 2.136 11.852 0.044 0.001 0.001 ***
non-rPSC vs healthy , Country 1 4.670 26.476 0.057 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC : Country 1 0.446 2.209 0.006 0.460 0.460
rPSC vs healthy : Country 1 0.495 2.765 0.010 0.102 0.153
non-rPSC vs healthy : Country 1 1.025 5.881 0.012 0.001 0.003 **

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_colon_genus_metadata$Group,
                      covariate = filt_colon_genus_metadata$Country, 
                      patients = filt_colon_genus_metadata$Patient,
                      group1 = group1,
                      group2 = group2,
                      sim.method = 'bray')
  print(result_list)
}
$`non-rPSC_healthy_CZ`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    1.415 0.03183 8.3495  0.001      0.001
Residual 254   43.052 0.96817                         
Total    255   44.467 1.00000                         

$`non-rPSC_healthy_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    2.837 0.08594 15.607  0.001      0.001
Residual 166   30.175 0.91406                         
Total    167   33.012 1.00000                         

$`non-rPSC_CZ_vs_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs     R2      F Pr(>F) p.adjusted
Cov        1    4.273 0.0792 22.449  0.001      0.001
Residual 261   49.683 0.9208                         
Total    262   53.956 1.0000                         

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1   1.4216 0.05694 9.6009  0.001      0.001
Residual 159  23.5437 0.94306                         
Total    160  24.9653 1.00000                         

Plots

p <- pca_plot_custom(filt_colon_genus_tab,
                                 filt_colon_genus_taxa,
                                 filt_colon_genus_metadata,
                                 measure = "bray",
                                 show_boxplots = TRUE,
                                 variable = "Group", size=3, show_legend=FALSE)

# save the results
supplements_beta[[paste("PCoA bray",level,segment)]] <- p

# see the results
p

Jaccard

PERMANOVA

pairwise_df <- filt_colon_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_colon_genus_metadata$Group,
                           covariate = filt_colon_genus_metadata$Country,
                           patients = filt_colon_genus_metadata$Patient,
                           sim.method = "jaccard", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,
                          filt_colon_genus_metadata$Group,
                          covariate = filt_colon_genus_metadata$Country,
                          patients = filt_colon_genus_metadata$Patient,
                          interaction = TRUE, sim.method = "jaccard", p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("jaccard",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 0.590 2.035 0.006 0.594 0.594
rPSC vs healthy 1 2.438 9.075 0.035 0.001 0.002 **
non-rPSC vs healthy 1 3.380 12.658 0.028 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC , Country 1 4.994 17.225 0.047 0.001 0.001 ***
rPSC vs healthy , Country 1 2.338 8.704 0.033 0.001 0.001 ***
non-rPSC vs healthy , Country 1 4.864 18.215 0.040 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC : Country 1 0.589 2.039 0.006 0.696 0.696
rPSC vs healthy : Country 1 0.632 2.365 0.009 0.208 0.312
non-rPSC vs healthy : Country 1 1.174 4.433 0.010 0.001 0.003 **

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_colon_genus_metadata$Group,
                      covariate = filt_colon_genus_metadata$Country, 
                      patients = filt_colon_genus_metadata$Patient,
                      group1 = group1,
                      group2 = group2,
                      sim.method = 'jaccard')
  print(result_list)
}
$`non-rPSC_healthy_CZ`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    1.591 0.02354 6.1232  0.001      0.001
Residual 254   65.982 0.97646                         
Total    255   67.573 1.00000                         

$`non-rPSC_healthy_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    2.963 0.06145 10.869  0.001      0.001
Residual 166   45.256 0.93855                         
Total    167   48.219 1.00000                         

$`non-rPSC_CZ_vs_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1    4.325 0.05591 15.458  0.001      0.001
Residual 261   73.032 0.94409                         
Total    262   77.358 1.00000                         

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs     R2      F Pr(>F) p.adjusted
Cov        1    1.712 0.0429 7.1261  0.001      0.001
Residual 159   38.206 0.9571                         
Total    160   39.918 1.0000                         

Plots

Custom

p <- pca_plot_custom(filt_colon_genus_tab,
                                 filt_colon_genus_taxa,
                                 filt_colon_genus_metadata,
                                 measure = "jaccard",
                                 show_boxplots = TRUE,
                                 variable = "Group", size=3, show_legend=FALSE)

# save the results
supplements_beta[[paste("PCoA jaccard",level,segment)]] <- p

# see the results
p

ASV level

level="ASV"
Aitchison
# preparing data frame
pairwise_df <- filt_colon_asv_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(x=pairwise_df,
                          filt_colon_metadata$Group,
                           covariate = filt_colon_metadata$Country, 
                           sim.method = "robust.aitchison", 
                           p.adjust.m="BH",
                           patients = filt_colon_metadata$Patient)

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_colon_metadata$Group,
                          covariate = filt_colon_metadata$Country, 
                          interaction = TRUE, 
                          sim.method = "robust.aitchison", 
                          p.adjust.m="BH",
                          patients = filt_colon_metadata$Patient)

pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("aitchison",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)

# see the results
pp_factor
pp_cov
pp_fac.cov
NA
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 470.424 1.893 0.005 0.968 0.968
rPSC vs healthy 1 1842.677 6.979 0.027 0.001 0.002 **
non-rPSC vs healthy 1 2475.754 9.500 0.022 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC , Country 1 1644.116 6.616 0.019 0.001 0.001 ***
rPSC vs healthy , Country 1 1119.321 4.239 0.017 0.001 0.001 ***
non-rPSC vs healthy , Country 1 1833.401 7.035 0.016 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC : Country 1 470.881 1.900 0.005 0.989 0.989
rPSC vs healthy : Country 1 517.888 1.969 0.008 0.896 0.989
non-rPSC vs healthy : Country 1 611.799 2.355 0.005 0.163 0.489

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

if (length(interaction_sig)>0){
 for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_colon_metadata$Group,
                      covariate = filt_colon_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      patients = filt_colon_metadata$Patient)
  print(result_list)
} 
}

PCoA

p <- pca_plot_custom(filt_colon_asv_tab,
                           filt_colon_taxa_tab,
                           filt_colon_metadata,
                           show_boxplots = TRUE,
                           variable = "Group", 
                           size=3, 
                           show_legend=FALSE)

# save the results
supplements_beta[[paste("PCoA aitchison",level,segment)]] <- p

# see the results
p

Bray-Curtis

PERMANOVA

# preparing data frame
pairwise_df <- filt_colon_asv_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_colon_metadata$Group,
                           covariate = filt_colon_metadata$Country,
                           patients = filt_colon_metadata$Patient,
                           sim.method = "bray", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,
                          filt_colon_metadata$Group,
                          covariate = filt_colon_metadata$Country, 
                          patients = filt_colon_metadata$Patient,
                          interaction = TRUE, sim.method = "bray", p.adjust.m="BH")

pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("bray",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 0.737 2.327 0.006 0.415 0.415
rPSC vs healthy 1 3.308 10.870 0.041 0.001 0.002 **
non-rPSC vs healthy 1 4.794 16.082 0.036 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC , Country 1 4.600 14.526 0.040 0.001 0.001 ***
rPSC vs healthy , Country 1 2.184 7.177 0.027 0.001 0.001 ***
non-rPSC vs healthy , Country 1 4.330 14.527 0.032 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC : Country 1 0.792 2.512 0.007 0.383 0.383
rPSC vs healthy : Country 1 0.815 2.697 0.010 0.145 0.217
non-rPSC vs healthy : Country 1 1.360 4.601 0.010 0.002 0.006 **

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_colon_metadata$Group,
                      covariate = filt_colon_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      patients = filt_colon_metadata$Patient,
                      sim.method = 'bray')
  print(result_list)
}
$`non-rPSC_healthy_CZ`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs    R2      F Pr(>F) p.adjusted
Fac        1    2.255 0.029 7.5854  0.001      0.001
Residual 254   75.510 0.971                         
Total    255   77.765 1.000                         

$`non-rPSC_healthy_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    3.899 0.07422 13.309  0.001      0.001
Residual 166   48.631 0.92578                         
Total    167   52.530 1.00000                         

$`non-rPSC_CZ_vs_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1    4.042 0.04823 13.227  0.001      0.001
Residual 261   79.749 0.95177                         
Total    262   83.790 1.00000                         

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1    1.649 0.03581 5.9053  0.001      0.001
Residual 159   44.393 0.96419                         
Total    160   46.041 1.00000                         

PCoA

p <- pca_plot_custom(filt_colon_asv_tab,
                     filt_colon_taxa_tab,
                     filt_colon_metadata,
                     measure = "bray",
                     show_boxplots = TRUE,
                     variable = "Group", size=3, show_legend=FALSE)

# save the results
supplements_beta[[paste("PCoA bray",level,segment)]] <- p

# see the results
p

Jaccard

PERMANOVA

# preparing data frame
pairwise_df <- filt_colon_asv_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_colon_metadata$Group,
                           covariate = filt_colon_metadata$Country,
                           patients = filt_colon_metadata$Patient,
                           sim.method = "jaccard", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,
                          filt_colon_metadata$Group,
                          covariate = filt_colon_metadata$Country, 
                          patients = filt_colon_metadata$Patient,
                          interaction = TRUE, sim.method = "jaccard", p.adjust.m="BH")

pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("jaccard",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 0.737 1.909 0.005 0.873 0.873
rPSC vs healthy 1 2.563 6.794 0.026 0.001 0.002 **
non-rPSC vs healthy 1 3.757 10.047 0.023 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC , Country 1 3.698 9.578 0.027 0.001 0.001 ***
rPSC vs healthy , Country 1 1.887 5.001 0.019 0.001 0.001 ***
non-rPSC vs healthy , Country 1 3.508 9.381 0.021 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC : Country 1 0.814 2.115 0.006 0.822 0.822
rPSC vs healthy : Country 1 0.853 2.271 0.009 0.388 0.582
non-rPSC vs healthy : Country 1 1.296 3.486 0.008 0.002 0.006 **

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_colon_metadata$Group,
                      covariate = filt_colon_metadata$Country, 
                      patients = filt_colon_metadata$Patient,
                      group1 = group1,
                      group2 = group2,
                      sim.method = 'jaccard')
  print(result_list)
}
$`non-rPSC_healthy_CZ`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    1.890 0.01954 5.0623  0.001      0.001
Residual 254   94.839 0.98046                         
Total    255   96.729 1.00000                         

$`non-rPSC_healthy_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Fac        1    3.163 0.04907 8.5656  0.001      0.001
Residual 166   61.293 0.95093                         
Total    167   64.456 1.00000                         

$`non-rPSC_CZ_vs_NO`
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1    3.288 0.03219 8.6816  0.001      0.001
Residual 261   98.853 0.96781                         
Total    262  102.141 1.00000                         

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: supplied matrix
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Cov        1    1.516 0.02578 4.2075  0.001      0.001
Residual 159   57.279 0.97422                         
Total    160   58.794 1.00000                         

PCoA

p <- pca_plot_custom(filt_colon_asv_tab,
                     filt_colon_taxa_tab,
                     filt_colon_metadata,
                     measure = "jaccard",
                     show_boxplots = TRUE,
                     variable = "Group", size=3, show_legend=FALSE)

# save the results
supplements_beta[[paste("PCoA jaccard",level,segment)]] <- p

# see the results
p

Saving results

write.xlsx(supplements_beta[!grepl("PCoA",names(supplements_beta))],
           file = file.path(path,
           paste0("supplements_beta_diversity_", segment,".xlsx")))

Univariate Analysis

Main - Genus level

level="genus"
# needed paths
path = "../results/Q2/univariate_analysis"
path_maaslin=file.path("../intermediate_files/maaslin/Q2",level)
# variables

raw_linda_results_genus[[segment]] <- list()
linda_results_genus[[segment]] <- list()

# country and interaction problems
list_country_union <- list()
list_intersections <- list()
list_venns <- list()
uni_statistics <- list()

# workbook for final df
wb <- createWorkbook()

# PSC effect

Genus level

rpsc_effect <- list()

Aggregate taxa

level="genus"
genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

colon_genus_asv_taxa_tab <- create_asv_taxa_table(colon_genus_tab,
                                                  colon_genus_taxa_tab)
rPSC vs non-rPSC
linDA
group <- c("non-rPSC","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])

# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
                            colon_genus_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")
Removing 28 ASV(s)
Removing 5 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data,
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  350  samples and  166  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")


for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle("Country effect") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) +
            ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
Warning: Removed 3 rows containing missing values or values outside the scale
range (`geom_text_repel()`).Warning: Removed 12 rows containing missing values or values outside the scale
range (`geom_text_repel()`).
volcano

Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Country - Union
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "non-rPSC"
[1] "rPSC"
[1] "non-rPSC"
[1] "rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

rPSC vs healthy

group <- c("healthy","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
linDA

# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
                            colon_genus_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")
Removing 98 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data,
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  248  samples and  178  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle("Country effect") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) +
            ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
Warning: Removed 31 rows containing missing values or values outside the scale
range (`geom_text_repel()`).Warning: Removed 12 rows containing missing values or values outside the scale
range (`geom_text_repel()`).
volcano

Group - Intersection
intersection_results <- group_intersection(group, 
                                           list_intersections, 
                                           list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Country - Union
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "rPSC"
[1] "healthy"
[1] "rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)
non-rPSC vs healthy
group <- c("healthy","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
linDA

# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
                            colon_genus_taxa_tab,
                            colon_metadata,group,
                            usage="linDA")
Removing 39 ASV(s)
Removing 6 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data,
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  424  samples and  159  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle("Country effect") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) +
            ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
Warning: Removed 29 rows containing missing values or values outside the scale
range (`geom_text_repel()`).Warning: Removed 19 rows containing missing values or values outside the scale
range (`geom_text_repel()`).
volcano

Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Country - Union
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "non-rPSC"
[1] "healthy"
[1] "non-rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)
Visualization

Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.

list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,colon_taxa_tab)
p_heatmap_linda

Dot heatmap

dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
                                      uni_statistics$colon[grepl(level,names(uni_statistics$colon))],
                                      colon_taxa_tab) + xlab("") + ylab("")
min_clr -2.080021 
max_clr 6.901023 
min_log -4.181218 
max_log 3.872081 
dotheatmap_linda

Horizontal bar plot

p_prevalence <- horizontal_barplot(wb,taxa=levels(dotheatmap_linda$data$SeqID))
Using SeqID as id variables

pdf("../figures/Q2/dotplot_colon.pdf",
    height =10,width = 4)
p
dev.off()
rPSC effect

pre_LTx vs Healthy and Post_LTx vs Healthy intersection

A <- list_intersections[[paste(segment,level,"healthy vs rPSC")]]
B <- list_intersections[[paste(segment,level,"healthy vs non-rPSC")]]
df <- A[!(A$SeqID %in% B$SeqID),]

rpsc_effect[[paste(segment,level)]] <- df
  
# see the results
rpsc_effect[[paste(segment,level)]] 

Saving results

# ALL DATA
saveWorkbook(wb,file.path(path,paste0("uni_analysis_wb_",segment,".xlsx")),
             overwrite = TRUE)

# PSC effect
write.xlsx(rpsc_effect[[paste(segment,level)]],file.path(path,paste0("rpsc_effect_",segment,".xlsx")))

# SIGNIFICANT taxa

write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
            `names<-`(gsub(segment, "", names(
              list_intersections[grepl(segment,names(list_intersections))]))),
           file.path(path,paste0("significant_taxa_",segment,".xlsx")))

Supplementary Analysis

ASV level

level="ASV"
path_maaslin="../intermediate_files/maaslin/Q2/ASV/"
raw_linda_results[[segment]] <- list()
linda_results[[segment]] <- list()
supplements_wb <- createWorkbook()
rPSC vs non-rPSC
group <- c("non-rPSC","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA


# prepare the data
linda_data <- binomial_prep(colon_asv_tab,
                            colon_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")
Removing 1358 ASV(s)
Removing 17 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data, 
                   filt_colon_uni_metadata, 
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  350  samples and  336  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
                                taxa_table = filt_colon_uni_taxa) +
              ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                 taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Country effect")

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

MaAsLin2

Volcano plot

volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)

# see the results
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                                                                      segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Country - Union

list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                                    filt_colon_uni_data,
                                                    filt_colon_uni_metadata,
                                                    segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "non-rPSC"
[1] "rPSC"
[1] "non-rPSC"
[1] "rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
rPSC vs healthy
group <- c("healthy","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA


# prepare the data
linda_data <- binomial_prep(colon_asv_tab,colon_taxa_tab,
                            colon_metadata,group, usage="linDA")
Removing 1644 ASV(s)
Removing 43 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data, 
                   filt_colon_uni_metadata, 
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  248  samples and  438  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
                                taxa_table = filt_colon_uni_taxa) +
              ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                 taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Country effect")

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)

# see the results
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                           segment=segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Country - Union

list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                               segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "rPSC"
[1] "healthy"
[1] "rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
non-rPSC vs healthy
group <- c("healthy","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA


# prepare the data
linda_data <- binomial_prep(colon_asv_tab,
                            colon_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")
Removing 527 ASV(s)
Removing 54 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data, 
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  423  samples and  383  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
                                taxa_table = filt_colon_uni_taxa) +
              ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                 taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Country effect")

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Country - Union

list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "non-rPSC"
[1] "healthy"
[1] "non-rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
Visualization

Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.

list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,colon_taxa_tab)
p_heatmap_linda

Dot heatmap

dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
                                      uni_statistics$colon[grepl(level,names(uni_statistics$colon))],
                                      colon_taxa_tab)
min_clr -1.227991 
max_clr 4.635178 
min_log -5.321387 
max_log 6.195274 
dotheatmap_linda

rPSC effect

pre_LTx vs Healthy and Post_LTx vs Healthy intersection

A <- list_intersections[[paste(segment,level,"healthy vs rPSC")]]
B <- list_intersections[[paste(segment,level,"healthy vs non-rPSC")]]
df <- A[!(A$SeqID %in% B$SeqID),]

rpsc_effect[[paste(segment,level)]] <- df
  
# see the results
rpsc_effect[[paste(segment,level)]] 

Phylum level

level="phylum"
path_maaslin="../intermediate_files/maaslin/Q2/Phylum/"
raw_linda_results_phylum[[segment]] <- list()
linda_results_phylum[[segment]] <- list()

Aggregate taxa

phylum_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = "Phylum")

colon_phylum_tab <- phylum_data[[1]]
colon_phylum_taxa_tab <- phylum_data[[2]]
rPSC vs non-rPSC
group <- c("non-rPSC","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA


# prepare the data
linda_data <- binomial_prep(colon_phylum_tab,
                            colon_phylum_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")
Removing 1 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data, 
                   filt_colon_uni_metadata, 
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  350  samples and  10  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
                                taxa_table = filt_colon_uni_taxa) +
              ggtitle(paste(group,collapse=" vs "))
Using Phylum for naming
volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                 taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Country effect")
Using Phylum for naming
volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Interaction")
Using Phylum for naming
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

MaAsLin2

Volcano plot

volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)

# see the results
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                                                                      segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Country - Union

list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                                    filt_colon_uni_data,
                                                    filt_colon_uni_metadata,
                                                    segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "non-rPSC"
[1] "rPSC"
[1] "non-rPSC"
[1] "rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
rPSC vs healthy
group <- c("healthy","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA


# prepare the data
linda_data <- binomial_prep(colon_phylum_tab,
                            colon_phylum_taxa_tab,
                            colon_metadata,group, usage="linDA")
Removing 2 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data, 
                   filt_colon_uni_metadata, 
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  248  samples and  10  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
                                taxa_table = filt_colon_uni_taxa) +
              ggtitle(paste(group,collapse=" vs "))
Using Phylum for naming
volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                 taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Country effect")
Using Phylum for naming
volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Interaction")
Using Phylum for naming
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)

# see the results
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                           segment=segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Country - Union

list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                               segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "rPSC"
[1] "healthy"
[1] "rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
non-rPSC vs healthy
group <- c("healthy","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])

linDA


# prepare the data
linda_data <- binomial_prep(colon_phylum_tab,
                            colon_phylum_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")
Removing 2 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data, 
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  424  samples and  9  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
                                taxa_table = filt_colon_uni_taxa) +
              ggtitle(paste(group,collapse=" vs "))
Using Phylum for naming
volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                 taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Country effect")
Using Phylum for naming
volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Interaction")
Using Phylum for naming
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

Country - Union

list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)

Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "non-rPSC"
[1] "healthy"
[1] "non-rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
Visualization

Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.

list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,colon_taxa_tab)
p_heatmap_linda

Dot heatmap

dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
                                      uni_statistics$colon[grepl(level,names(uni_statistics$colon))],
                                      colon_taxa_tab)
min_clr -4.222255 
max_clr 4.824442 
min_log -3.334323 
max_log 3.037521 
dotheatmap_linda

rPSC effect

pre_LTx vs Healthy and Post_LTx vs Healthy intersection

A <- list_intersections[[paste(segment,level,"healthy vs rPSC")]]
B <- list_intersections[[paste(segment,level,"healthy vs non-rPSC")]]
df <- A[!(A$SeqID %in% B$SeqID),]

rpsc_effect[[paste(segment,level)]] <- df
  
# see the results
rpsc_effect[[paste(segment,level)]] 

Saving results

# ALL DATA
saveWorkbook(supplements_wb,file.path(path,paste0("supplements_uni_analysis_wb_",segment,".xlsx")),overwrite = TRUE)

# PSC effect
write.xlsx(rpsc_effect,
          file.path(path,paste0("supplements_rpsc_effect_",segment,".xlsx")))

# SIGNIFICANT taxa
write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
            `names<-`(gsub(segment, "", names(
              list_intersections[grepl(segment,names(list_intersections))]))),
           file.path(path,paste0("supplements_significant_taxa_",segment,".xlsx")))

Machine learning

path = "../results/Q2/models"

ElasticNet

model="enet"

ASV level

level="ASV"
rPSC vs non-rPSC
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 1358 ASV(s)
Removing 17 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.20000000
lambda                          0.02642212
auc                             0.99995630
auc_czech                       0.99987060
auc_no                          1.00000000
auc_optimism_corrected          0.68930277
auc_optimism_corrected_CIL      0.58182584
auc_optimism_corrected_CIU      0.79550463
accuracy                        0.99714286
accuracy_czech                         NaN
accuracy_no                     1.00000000
accuracy_optimism_corrected     0.75877572
accuracy_optimism_corrected_CIL 0.72626823
accuracy_optimism_corrected_CIU 0.79732143
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 262   1
   1   0  87

$czech
    Predicted
True   0   1
   0 160   1
   1   0  48

$no
    Predicted
True   0   1
   0 102   0
   1   0  39
enet_model$plot

roc_c

rPSC vs healthy
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)
Removing 1644 ASV(s)
Removing 43 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                       [,1]
alpha                           0.800000000
lambda                          0.004696739
auc                             1.000000000
auc_czech                       1.000000000
auc_no                          1.000000000
auc_optimism_corrected          0.984090282
auc_optimism_corrected_CIL      0.963868802
auc_optimism_corrected_CIU      0.999092742
accuracy                        1.000000000
accuracy_czech                          NaN
accuracy_no                     1.000000000
accuracy_optimism_corrected     0.913881132
accuracy_optimism_corrected_CIL 0.847549020
accuracy_optimism_corrected_CIU 0.973152174
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 161   0
   1   0  87

$czech
    Predicted
True  0  1
   0 95  0
   1  0 48

$no
    Predicted
True  0  1
   0 66  0
   1  0 39
enet_model$plot


roc_c

non-rPSC vs healthy
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 527 ASV(s)
Removing 54 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                       [,1]
alpha                           0.600000000
lambda                          0.005254429
auc                             1.000000000
auc_czech                       1.000000000
auc_no                          1.000000000
auc_optimism_corrected          0.956087245
auc_optimism_corrected_CIL      0.923453896
auc_optimism_corrected_CIU      0.978627881
accuracy                        1.000000000
accuracy_czech                          NaN
accuracy_no                     1.000000000
accuracy_optimism_corrected     0.884857150
accuracy_optimism_corrected_CIL 0.822568090
accuracy_optimism_corrected_CIU 0.926187201
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 161   0
   1   0 262

$czech
    Predicted
True   0   1
   0  95   0
   1   0 160

$no
    Predicted
True   0   1
   0  66   0
   1   0 102
enet_model$plot


roc_c

rPSC effect
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_asv_tab,
                                                colon_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")
Removing 1358 ASV(s)
Removing 17 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.8000000
lambda                          0.0217535
auc                             0.9212447
auc_czech                       0.9039855
auc_no                          0.9441931
auc_optimism_corrected          0.7892446
auc_optimism_corrected_CIL      0.7051772
auc_optimism_corrected_CIU      0.8440312
accuracy                        0.8400000
accuracy_czech                        NaN
accuracy_no                     0.8865248
accuracy_optimism_corrected     0.7720808
accuracy_optimism_corrected_CIL 0.7394864
accuracy_optimism_corrected_CIU 0.7990527
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 254   9
   1  47  40

$czech
    Predicted
True   0   1
   0 152   9
   1  31  17

$no
    Predicted
True   0   1
   0 102   0
   1  16  23
enet_model$plot


roc_c

Genus level

level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]
rPSC vs non-rPSC
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.00000000
lambda                          0.03550553
auc                             0.99707181
auc_czech                       0.99495342
auc_no                          1.00000000
auc_optimism_corrected          0.61631007
auc_optimism_corrected_CIL      0.52956470
auc_optimism_corrected_CIU      0.73363312
accuracy                        0.98571429
accuracy_czech                         NaN
accuracy_no                     0.98581560
accuracy_optimism_corrected     0.69300993
accuracy_optimism_corrected_CIL 0.62137380
accuracy_optimism_corrected_CIU 0.73052686
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 262   1
   1   4  83

$czech
    Predicted
True   0   1
   0 160   1
   1   2  46

$no
    Predicted
True   0   1
   0 102   0
   1   2  37
enet_model$plot


roc_c

rPSC vs healthy
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 98 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                        [,1]
alpha                           0.8000000000
lambda                          0.0001910911
auc                             1.0000000000
auc_czech                       1.0000000000
auc_no                          1.0000000000
auc_optimism_corrected          0.9578010737
auc_optimism_corrected_CIL      0.9109651085
auc_optimism_corrected_CIU      0.9970643939
accuracy                        1.0000000000
accuracy_czech                           NaN
accuracy_no                     1.0000000000
accuracy_optimism_corrected     0.9191192622
accuracy_optimism_corrected_CIL 0.8130021521
accuracy_optimism_corrected_CIU 0.9849584399
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 161   0
   1   0  87

$czech
    Predicted
True  0  1
   0 95  0
   1  0 48

$no
    Predicted
True  0  1
   0 66  0
   1  0 39
enet_model$plot


roc_c

rPSC vs healthy
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 39 ASV(s)
Removing 6 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group",
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                       [,1]
alpha                           0.200000000
lambda                          0.002866942
auc                             1.000000000
auc_czech                       1.000000000
auc_no                          1.000000000
auc_optimism_corrected          0.964410427
auc_optimism_corrected_CIL      0.941685324
auc_optimism_corrected_CIU      0.986722684
accuracy                        1.000000000
accuracy_czech                          NaN
accuracy_no                     1.000000000
accuracy_optimism_corrected     0.902089871
accuracy_optimism_corrected_CIL 0.854318182
accuracy_optimism_corrected_CIU 0.925430657
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 161   0
   1   0 263

$czech
    Predicted
True   0   1
   0  95   0
   1   0 161

$no
    Predicted
True   0   1
   0  66   0
   1   0 102
enet_model$plot

roc_c

rPSC effect
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_genus_tab,
                                                colon_genus_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")
Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.60000000
lambda                          0.05398387
auc                             0.81010445
auc_czech                       0.80421843
auc_no                          0.80869784
auc_optimism_corrected          0.74403921
auc_optimism_corrected_CIL      0.70796536
auc_optimism_corrected_CIU      0.77093481
accuracy                        0.80571429
accuracy_czech                         NaN
accuracy_no                     0.78014184
accuracy_optimism_corrected     0.74738866
accuracy_optimism_corrected_CIL 0.70677659
accuracy_optimism_corrected_CIU 0.79792542
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 262   1
   1  67  20

$czech
    Predicted
True   0   1
   0 160   1
   1  36  12

$no
    Predicted
True   0   1
   0 102   0
   1  31   8
enet_model$plot


roc_c

Saving results

models_summ_df_colon <- do.call(rbind, 
  models_summ[grep(segment,names(models_summ),value = TRUE)])

write.csv(models_summ_df_colon,file.path(path,paste0("elastic_net_",segment,".csv")))

Supplementary models

CLR-transformed data

kNN
model="knn"
ASV level
level="ASV"

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)
Removing 1358 ASV(s)
Removing 17 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               11.0000000
auc                              0.9118701
auc_optimism_corrected           0.6193951
auc_optimism_corrected_CIL       0.5321155
auc_optimism_corrected_CIU       0.6960323
accuracy                         0.8457143
accuracy_optimism_corrected      0.7198609
accuracy_optimism_corrected_CIL  0.6159331
accuracy_optimism_corrected_CIU  0.7970677
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)


# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)
Removing 1644 ASV(s)
Removing 43 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs


# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               10.0000000
auc                              0.9846862
auc_optimism_corrected           0.9336959
auc_optimism_corrected_CIL       0.9016284
auc_optimism_corrected_CIU       0.9715522
accuracy                         0.9233871
accuracy_optimism_corrected      0.8776633
accuracy_optimism_corrected_CIL  0.8257353
accuracy_optimism_corrected_CIU  0.9427339
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)
Removing 527 ASV(s)
Removing 54 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               13.0000000
auc                              0.9580390
auc_optimism_corrected           0.8644098
auc_optimism_corrected_CIL       0.8002895
auc_optimism_corrected_CIU       0.9047383
accuracy                         0.9078014
accuracy_optimism_corrected      0.7857898
accuracy_optimism_corrected_CIL  0.7416603
accuracy_optimism_corrected_CIU  0.8331744
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_asv_tab,
                                                colon_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")
Removing 1358 ASV(s)
Removing 17 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               10.0000000
auc                              0.9305537
auc_optimism_corrected           0.7628238
auc_optimism_corrected_CIL       0.6612676
auc_optimism_corrected_CIU       0.8130056
accuracy                         0.7942857
accuracy_optimism_corrected      0.7591353
accuracy_optimism_corrected_CIL  0.7089049
accuracy_optimism_corrected_CIU  0.8071954
roc_c

Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs


# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               12.0000000
auc                              0.9046370
auc_optimism_corrected           0.6192079
auc_optimism_corrected_CIL       0.5166357
auc_optimism_corrected_CIU       0.7782418
accuracy                         0.7771429
accuracy_optimism_corrected      0.7540616
accuracy_optimism_corrected_CIL  0.6551291
accuracy_optimism_corrected_CIU  0.8465071
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 98 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs


# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               10.0000000
auc                              0.9907546
auc_optimism_corrected           0.9111602
auc_optimism_corrected_CIL       0.8565643
auc_optimism_corrected_CIU       0.9728443
accuracy                         0.8225806
accuracy_optimism_corrected      0.8201925
accuracy_optimism_corrected_CIL  0.7343137
accuracy_optimism_corrected_CIU  0.8971792
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 39 ASV(s)
Removing 6 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs


# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               18.0000000
auc                              0.9571003
auc_optimism_corrected           0.8711618
auc_optimism_corrected_CIL       0.8101521
auc_optimism_corrected_CIU       0.9341724
accuracy                         0.7900943
accuracy_optimism_corrected      0.7208789
accuracy_optimism_corrected_CIL  0.6418008
accuracy_optimism_corrected_CIU  0.7879489
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_genus_tab,
                                                colon_genus_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")
Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               11.0000000
auc                              0.8893186
auc_optimism_corrected           0.7902483
auc_optimism_corrected_CIL       0.7230248
auc_optimism_corrected_CIU       0.8497950
accuracy                         0.8085714
accuracy_optimism_corrected      0.7628243
accuracy_optimism_corrected_CIL  0.7185538
accuracy_optimism_corrected_CIU  0.8085448
roc_c

Random Forest
model="rf"
ASV level
level="ASV"

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)
Removing 1358 ASV(s)
Removing 17 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "81"       
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.647063" 
auc_optimism_corrected_CIL      "0.5129271"
auc_optimism_corrected_CIU      "0.7859063"
accuracy                        "1"        
accuracy_optimism_corrected     "0.7630961"
accuracy_optimism_corrected_CIL "0.6697138"
accuracy_optimism_corrected_CIU "0.826758" 
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)
Removing 1644 ASV(s)
Removing 43 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "399"      
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.9435052"
auc_optimism_corrected_CIL      "0.9084575"
auc_optimism_corrected_CIU      "0.9855619"
accuracy                        "1"        
accuracy_optimism_corrected     "0.839974" 
accuracy_optimism_corrected_CIL "0.7651471"
accuracy_optimism_corrected_CIU "0.9229859"
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)
Removing 527 ASV(s)
Removing 54 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "105"      
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.902604" 
auc_optimism_corrected_CIL      "0.818777" 
auc_optimism_corrected_CIU      "0.9671069"
accuracy                        "1"        
accuracy_optimism_corrected     "0.8518032"
accuracy_optimism_corrected_CIL "0.7927907"
accuracy_optimism_corrected_CIU "0.9166186"
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_asv_tab,
                                                colon_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")
Removing 1358 ASV(s)
Removing 17 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "31"       
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.8775337"
auc_optimism_corrected_CIL      "0.837927" 
auc_optimism_corrected_CIU      "0.9428915"
accuracy                        "1"        
accuracy_optimism_corrected     "0.8393827"
accuracy_optimism_corrected_CIL "0.8161337"
accuracy_optimism_corrected_CIU "0.8779587"
roc_c

Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "61"       
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.6441319"
auc_optimism_corrected_CIL      "0.5280122"
auc_optimism_corrected_CIU      "0.7594818"
accuracy                        "1"        
accuracy_optimism_corrected     "0.7511293"
accuracy_optimism_corrected_CIL "0.6591262"
accuracy_optimism_corrected_CIU "0.8142136"
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 98 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "17"       
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.9609761"
auc_optimism_corrected_CIL      "0.9309369"
auc_optimism_corrected_CIU      "0.996648" 
accuracy                        "1"        
accuracy_optimism_corrected     "0.8907039"
accuracy_optimism_corrected_CIL "0.8462518"
accuracy_optimism_corrected_CIU "0.9622759"
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 39 ASV(s)
Removing 6 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "145"      
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.9327948"
auc_optimism_corrected_CIL      "0.9093843"
auc_optimism_corrected_CIU      "0.9691381"
accuracy                        "1"        
accuracy_optimism_corrected     "0.8753493"
accuracy_optimism_corrected_CIL "0.8543221"
accuracy_optimism_corrected_CIU "0.9174511"
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_genus_tab,
                                                colon_genus_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")
Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "15"       
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.8589471"
auc_optimism_corrected_CIL      "0.8065168"
auc_optimism_corrected_CIU      "0.8991119"
accuracy                        "1"        
accuracy_optimism_corrected     "0.82244"  
accuracy_optimism_corrected_CIL "0.7632796"
accuracy_optimism_corrected_CIU "0.8857639"
roc_c

Gradient boosting
model="gb"
ASV level
level="ASV"

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)
Removing 1358 ASV(s)
Removing 17 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.6148368
auc_optimism_corrected_CIL        0.5423221
auc_optimism_corrected_CIU        0.6820819
accuracy                          1.0000000
accuracy_optimism_corrected       0.7576371
accuracy_optimism_corrected_CIL   0.6587331
accuracy_optimism_corrected_CIU   0.8067363
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)
Removing 1644 ASV(s)
Removing 43 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         100.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.9560002
auc_optimism_corrected_CIL        0.9315334
auc_optimism_corrected_CIU        0.9884204
accuracy                          1.0000000
accuracy_optimism_corrected       0.8506360
accuracy_optimism_corrected_CIL   0.7681373
accuracy_optimism_corrected_CIU   0.9263467
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)
Removing 527 ASV(s)
Removing 54 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   30.0000000
auc                               1.0000000
auc_optimism_corrected            0.9464332
auc_optimism_corrected_CIL        0.8709877
auc_optimism_corrected_CIU        0.9805888
accuracy                          1.0000000
accuracy_optimism_corrected       0.8823593
accuracy_optimism_corrected_CIL   0.8261426
accuracy_optimism_corrected_CIU   0.9234599
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_asv_tab,
                                                colon_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")
Removing 1358 ASV(s)
Removing 17 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.8622298
auc_optimism_corrected_CIL        0.8179067
auc_optimism_corrected_CIU        0.9279510
accuracy                          1.0000000
accuracy_optimism_corrected       0.8250690
accuracy_optimism_corrected_CIL   0.7954854
accuracy_optimism_corrected_CIU   0.8745536
roc_c

Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.6736680
auc_optimism_corrected_CIL        0.5915391
auc_optimism_corrected_CIU        0.7914096
accuracy                          1.0000000
accuracy_optimism_corrected       0.7602758
accuracy_optimism_corrected_CIL   0.6513808
accuracy_optimism_corrected_CIU   0.8569308
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 98 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 1.0000000
shrinkage                         0.1000000
n.minobsinnode                   20.0000000
auc                               1.0000000
auc_optimism_corrected            0.9735137
auc_optimism_corrected_CIL        0.9426804
auc_optimism_corrected_CIU        0.9927191
accuracy                          1.0000000
accuracy_optimism_corrected       0.9007885
accuracy_optimism_corrected_CIL   0.8433943
accuracy_optimism_corrected_CIU   0.9309314
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 39 ASV(s)
Removing 6 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.9642380
auc_optimism_corrected_CIL        0.9495139
auc_optimism_corrected_CIU        0.9860100
accuracy                          1.0000000
accuracy_optimism_corrected       0.9142135
accuracy_optimism_corrected_CIL   0.8871296
accuracy_optimism_corrected_CIU   0.9571992
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_genus_tab,
                                                colon_genus_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")
Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         100.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.8240318
auc_optimism_corrected_CIL        0.7238738
auc_optimism_corrected_CIU        0.8684794
accuracy                          1.0000000
accuracy_optimism_corrected       0.8132742
accuracy_optimism_corrected_CIL   0.7509221
accuracy_optimism_corrected_CIU   0.8824551
roc_c

Relative abundances

Elastic net
ASV level
level="ASV"

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 1358 ASV(s)
Removing 17 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                       [,1]
alpha                           0.800000000
lambda                          0.005377316
auc                             1.000000000
auc_czech                       1.000000000
auc_no                          1.000000000
auc_optimism_corrected          0.613054927
auc_optimism_corrected_CIL      0.490857300
auc_optimism_corrected_CIU      0.723265090
accuracy                        0.997142857
accuracy_czech                          NaN
accuracy_no                     1.000000000
accuracy_optimism_corrected     0.733453765
accuracy_optimism_corrected_CIL 0.682269385
accuracy_optimism_corrected_CIU 0.798079758
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 1644 ASV(s)
Removing 43 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                       [,1]
alpha                           0.800000000
lambda                          0.003746731
auc                             1.000000000
auc_czech                       1.000000000
auc_no                          1.000000000
auc_optimism_corrected          0.960651806
auc_optimism_corrected_CIL      0.873676035
auc_optimism_corrected_CIU      0.989528910
accuracy                        1.000000000
accuracy_czech                          NaN
accuracy_no                     1.000000000
accuracy_optimism_corrected     0.895125287
accuracy_optimism_corrected_CIL 0.819362745
accuracy_optimism_corrected_CIU 0.972995169
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)


# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 527 ASV(s)
Removing 54 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.60000000
lambda                          0.01268861
auc                             1.00000000
auc_czech                       1.00000000
auc_no                          1.00000000
auc_optimism_corrected          0.92948430
auc_optimism_corrected_CIL      0.89166078
auc_optimism_corrected_CIU      0.95882539
accuracy                        1.00000000
accuracy_czech                         NaN
accuracy_no                     1.00000000
accuracy_optimism_corrected     0.86754960
accuracy_optimism_corrected_CIL 0.80839950
accuracy_optimism_corrected_CIU 0.91957237
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_asv_tab,
                                                colon_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")
Removing 1358 ASV(s)
Removing 17 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.60000000
lambda                          0.06073667
auc                             0.88955902
auc_czech                       0.89201605
auc_no                          0.88939165
auc_optimism_corrected          0.79485800
auc_optimism_corrected_CIL      0.71658439
auc_optimism_corrected_CIU      0.85869649
accuracy                        0.80857143
accuracy_czech                         NaN
accuracy_no                     0.81560284
accuracy_optimism_corrected     0.76801287
accuracy_optimism_corrected_CIL 0.72887541
accuracy_optimism_corrected_CIU 0.81859244
roc_c

Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.0000000
lambda                          0.0432959
auc                             0.9950614
auc_czech                       0.9940476
auc_no                          0.9972348
auc_optimism_corrected          0.5433647
auc_optimism_corrected_CIL      0.4248570
auc_optimism_corrected_CIU      0.6886757
accuracy                        0.9542857
accuracy_czech                        NaN
accuracy_no                     0.9503546
accuracy_optimism_corrected     0.6803041
accuracy_optimism_corrected_CIL 0.6065087
accuracy_optimism_corrected_CIU 0.7365934
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)


# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 98 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                       [,1]
alpha                           0.600000000
lambda                          0.002205278
auc                             1.000000000
auc_czech                       1.000000000
auc_no                          1.000000000
auc_optimism_corrected          0.924618301
auc_optimism_corrected_CIL      0.870002633
auc_optimism_corrected_CIU      0.976756973
accuracy                        1.000000000
accuracy_czech                          NaN
accuracy_no                     1.000000000
accuracy_optimism_corrected     0.867986390
accuracy_optimism_corrected_CIL 0.814881321
accuracy_optimism_corrected_CIU 0.919479726
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 39 ASV(s)
Removing 6 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.00000000
lambda                          0.03919916
auc                             0.99981107
auc_czech                       0.99960771
auc_no                          1.00000000
auc_optimism_corrected          0.91708414
auc_optimism_corrected_CIL      0.87732725
auc_optimism_corrected_CIU      0.95879325
accuracy                        0.99056604
accuracy_czech                         NaN
accuracy_no                     0.99404762
accuracy_optimism_corrected     0.86364351
accuracy_optimism_corrected_CIL 0.81937500
accuracy_optimism_corrected_CIU 0.90429004
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_genus_tab,
                                                colon_genus_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")
Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.80000000
lambda                          0.06978427
auc                             0.75075390
auc_czech                       0.73770704
auc_no                          0.74358974
auc_optimism_corrected          0.67281383
auc_optimism_corrected_CIL      0.58167543
auc_optimism_corrected_CIU      0.74475106
accuracy                        0.75714286
accuracy_czech                         NaN
accuracy_no                     0.73758865
accuracy_optimism_corrected     0.74781625
accuracy_optimism_corrected_CIL 0.69903668
accuracy_optimism_corrected_CIU 0.80869956
roc_c

kNN
ASV level
level="ASV"

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 1358 ASV(s)
Removing 17 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               17.0000000
auc                              0.8139067
auc_optimism_corrected           0.4438631
auc_optimism_corrected_CIL       0.3291721
auc_optimism_corrected_CIU       0.5881490
accuracy                         0.7571429
accuracy_optimism_corrected      0.6934797
accuracy_optimism_corrected_CIL  0.5889706
accuracy_optimism_corrected_CIU  0.7667208
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 1644 ASV(s)
Removing 43 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               10.0000000
auc                              0.9629114
auc_optimism_corrected           0.7843430
auc_optimism_corrected_CIL       0.6912901
auc_optimism_corrected_CIU       0.8648065
accuracy                         0.8145161
accuracy_optimism_corrected      0.7400341
accuracy_optimism_corrected_CIL  0.6227501
accuracy_optimism_corrected_CIU  0.8273529
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 527 ASV(s)
Removing 54 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               29.0000000
auc                              0.8944811
auc_optimism_corrected           0.7938669
auc_optimism_corrected_CIL       0.7349389
auc_optimism_corrected_CIU       0.8782914
accuracy                         0.8274232
accuracy_optimism_corrected      0.7413141
accuracy_optimism_corrected_CIL  0.6641102
accuracy_optimism_corrected_CIU  0.8439400
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_asv_tab,
                                                colon_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")
Removing 1358 ASV(s)
Removing 17 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               10.0000000
auc                              0.8733884
auc_optimism_corrected           0.7140626
auc_optimism_corrected_CIL       0.6480679
auc_optimism_corrected_CIU       0.7894333
accuracy                         0.8028571
accuracy_optimism_corrected      0.7515490
accuracy_optimism_corrected_CIL  0.6955300
accuracy_optimism_corrected_CIU  0.8365275
roc_c

Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               11.0000000
auc                              0.9109523
auc_optimism_corrected           0.6234391
auc_optimism_corrected_CIL       0.5003192
auc_optimism_corrected_CIU       0.7414446
accuracy                         0.8000000
accuracy_optimism_corrected      0.7494729
accuracy_optimism_corrected_CIL  0.6423317
accuracy_optimism_corrected_CIU  0.8130760
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 98 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               10.0000000
auc                              0.9592704
auc_optimism_corrected           0.7557076
auc_optimism_corrected_CIL       0.6161263
auc_optimism_corrected_CIU       0.8674111
accuracy                         0.7620968
accuracy_optimism_corrected      0.7306721
accuracy_optimism_corrected_CIL  0.6425903
accuracy_optimism_corrected_CIU  0.7947059
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 39 ASV(s)
Removing 6 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               11.0000000
auc                              0.9436270
auc_optimism_corrected           0.8396098
auc_optimism_corrected_CIL       0.7714357
auc_optimism_corrected_CIU       0.8993877
accuracy                         0.8773585
accuracy_optimism_corrected      0.7730456
accuracy_optimism_corrected_CIL  0.7068539
accuracy_optimism_corrected_CIU  0.8297749
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_genus_tab,
                                                colon_genus_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")
Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
                                      [,1]
k                               16.0000000
auc                              0.7672960
auc_optimism_corrected           0.6409962
auc_optimism_corrected_CIL       0.5269546
auc_optimism_corrected_CIU       0.7463867
accuracy                         0.7657143
accuracy_optimism_corrected      0.7423961
accuracy_optimism_corrected_CIL  0.7085608
accuracy_optimism_corrected_CIU  0.7772584
roc_c

Random Forest
ASV level
level="ASV"

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 1358 ASV(s)
Removing 17 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "245"      
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.6648962"
auc_optimism_corrected_CIL      "0.5579935"
auc_optimism_corrected_CIU      "0.7732615"
accuracy                        "1"        
accuracy_optimism_corrected     "0.7906038"
accuracy_optimism_corrected_CIL "0.7181079"
accuracy_optimism_corrected_CIU "0.8579319"
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 1644 ASV(s)
Removing 43 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "47"       
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.9709215"
auc_optimism_corrected_CIL      "0.9461604"
auc_optimism_corrected_CIU      "0.9992061"
accuracy                        "1"        
accuracy_optimism_corrected     "0.8739657"
accuracy_optimism_corrected_CIL "0.8207317"
accuracy_optimism_corrected_CIU "0.9642048"
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 527 ASV(s)
Removing 54 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "39"       
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.9434682"
auc_optimism_corrected_CIL      "0.9109581"
auc_optimism_corrected_CIU      "0.968517" 
accuracy                        "1"        
accuracy_optimism_corrected     "0.8756203"
accuracy_optimism_corrected_CIL "0.8261426"
accuracy_optimism_corrected_CIU "0.9348684"
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_asv_tab,
                                                colon_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")
Removing 1358 ASV(s)
Removing 17 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "7"        
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.892496" 
auc_optimism_corrected_CIL      "0.8406943"
auc_optimism_corrected_CIU      "0.9559446"
accuracy                        "1"        
accuracy_optimism_corrected     "0.8395267"
accuracy_optimism_corrected_CIL "0.799001" 
accuracy_optimism_corrected_CIU "0.877986" 
roc_c

Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "61"       
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.6592904"
auc_optimism_corrected_CIL      "0.5159226"
auc_optimism_corrected_CIU      "0.8034246"
accuracy                        "1"        
accuracy_optimism_corrected     "0.751142" 
accuracy_optimism_corrected_CIL "0.637694" 
accuracy_optimism_corrected_CIU "0.8387735"
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 98 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "49"       
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.9665928"
auc_optimism_corrected_CIL      "0.9402272"
auc_optimism_corrected_CIU      "0.9911378"
accuracy                        "1"        
accuracy_optimism_corrected     "0.8922367"
accuracy_optimism_corrected_CIL "0.8499717"
accuracy_optimism_corrected_CIU "0.9318095"
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 39 ASV(s)
Removing 6 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "61"       
splitrule                       "gini"     
min.node.size                   "5"        
auc                             "1"        
auc_optimism_corrected          "0.9434458"
auc_optimism_corrected_CIL      "0.9158285"
auc_optimism_corrected_CIU      "0.9854407"
accuracy                        "1"        
accuracy_optimism_corrected     "0.8839496"
accuracy_optimism_corrected_CIL "0.8487472"
accuracy_optimism_corrected_CIU "0.919606" 
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_genus_tab,
                                                colon_genus_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")
Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
                                [,1]       
mtry                            "31"       
splitrule                       "gini"     
min.node.size                   "2"        
auc                             "1"        
auc_optimism_corrected          "0.8495906"
auc_optimism_corrected_CIL      "0.7950936"
auc_optimism_corrected_CIU      "0.8966604"
accuracy                        "1"        
accuracy_optimism_corrected     "0.825695" 
accuracy_optimism_corrected_CIL "0.7573706"
accuracy_optimism_corrected_CIU "0.8824551"
roc_c

Gradient boosting
ASV level
level="ASV"

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 1358 ASV(s)
Removing 17 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.6240362
auc_optimism_corrected_CIL        0.5222934
auc_optimism_corrected_CIU        0.7546738
accuracy                          1.0000000
accuracy_optimism_corrected       0.7661990
accuracy_optimism_corrected_CIL   0.7053971
accuracy_optimism_corrected_CIU   0.8279869
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 1644 ASV(s)
Removing 43 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         200.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.9706600
auc_optimism_corrected_CIL        0.9444854
auc_optimism_corrected_CIU        0.9970412
accuracy                          1.0000000
accuracy_optimism_corrected       0.8940968
accuracy_optimism_corrected_CIL   0.8323888
accuracy_optimism_corrected_CIU   0.9670908
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)
Removing 527 ASV(s)
Removing 54 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         200.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   30.0000000
auc                               1.0000000
auc_optimism_corrected            0.9537613
auc_optimism_corrected_CIL        0.9083638
auc_optimism_corrected_CIU        0.9737677
accuracy                          1.0000000
accuracy_optimism_corrected       0.8841500
accuracy_optimism_corrected_CIL   0.8117410
accuracy_optimism_corrected_CIU   0.9152636
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_asv_tab,
                                                colon_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")
Removing 1358 ASV(s)
Removing 17 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.8681106
auc_optimism_corrected_CIL        0.7814398
auc_optimism_corrected_CIU        0.9406792
accuracy                          1.0000000
accuracy_optimism_corrected       0.8298620
accuracy_optimism_corrected_CIL   0.7655238
accuracy_optimism_corrected_CIU   0.9041700
roc_c

Genus level
level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

rPSC vs non-rPSC

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.6462040
auc_optimism_corrected_CIL        0.5751847
auc_optimism_corrected_CIU        0.7875026
accuracy                          1.0000000
accuracy_optimism_corrected       0.7712399
accuracy_optimism_corrected_CIL   0.6992055
accuracy_optimism_corrected_CIU   0.8481626
roc_c

rPSC vs healthy

group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 98 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 3.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.9824653
auc_optimism_corrected_CIL        0.9736756
auc_optimism_corrected_CIU        0.9971742
accuracy                          1.0000000
accuracy_optimism_corrected       0.9155915
accuracy_optimism_corrected_CIL   0.8790172
accuracy_optimism_corrected_CIU   0.9439629
roc_c

non-rPSC vs healthy

group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)
Removing 39 ASV(s)
Removing 6 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         200.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.9664948
auc_optimism_corrected_CIL        0.9412675
auc_optimism_corrected_CIU        0.9824918
accuracy                          1.0000000
accuracy_optimism_corrected       0.9126731
accuracy_optimism_corrected_CIL   0.8841197
accuracy_optimism_corrected_CIU   0.9393719
roc_c

rPSC effect

model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_genus_tab,
                                                colon_genus_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")
Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)
Warning: Ignoring unknown aesthetics: by
# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
                                       [,1]
n.trees                         500.0000000
interaction.depth                 5.0000000
shrinkage                         0.1000000
n.minobsinnode                   10.0000000
auc                               1.0000000
auc_optimism_corrected            0.8236631
auc_optimism_corrected_CIL        0.7353234
auc_optimism_corrected_CIU        0.8705153
accuracy                          1.0000000
accuracy_optimism_corrected       0.8177658
accuracy_optimism_corrected_CIL   0.7556128
accuracy_optimism_corrected_CIU   0.8545296
roc_c

Saving results

models_list <- list()

for (model_name in names(supplements_models$models_summ)){
  df <- do.call(rbind, supplements_models$models_summ[[model_name]])
  models_list[[model_name]] <- df
}

write.xlsx(models_list,
           file=file.path(path,paste0("supplements_models_",segment,".xlsx")),
           rowNames=TRUE)

Results overview

Alpha diversity

pc_observed[[segment]]
pc_shannon[[segment]]
pc_simpson[[segment]]
pc_pielou[[segment]]

Plots

alpha_div_plots[[paste(segment,"Country")]]

alpha_div_plots[[paste(segment,"Custom")]]

Beta diversity

Main results

pairwise_aitchison_raw[[paste("genus", segment)]]

PCA

pca_plots_list[[paste(segment,"genus custom")]]

Supplements

knitr::kable(supplements_beta[!grepl("PCoA",names(supplements_beta))],
             digits = 3,
             caption = "Supplementary PERMANOVA results")
Supplementary PERMANOVA results
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 0.251 1.208 0.008 0.241 0.241
rPSC vs healthy 1 0.813 4.432 0.039 0.001 0.002 **
non-rPSC vs healthy 1 1.241 6.538 0.034 0.001 0.002 **
rPSC vs non-rPSC , Country 1 1.449 6.976 0.049 0.001 0.001 ***
rPSC vs healthy , Country 1 0.972 5.300 0.047 0.001 0.001 ***
non-rPSC vs healthy , Country 1 1.336 7.038 0.037 0.001 0.001 ***
rPSC vs non-rPSC : Country 1 0.151 0.724 0.005 0.803 0.803
rPSC vs healthy : Country 1 0.287 1.572 0.014 0.067 0.100
non-rPSC vs healthy : Country 1 0.354 1.873 0.010 0.011 0.033 *
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 0.337 1.138 0.008 0.241 0.241
rPSC vs healthy 1 0.887 3.269 0.030 0.001 0.002 **
non-rPSC vs healthy 1 1.349 4.846 0.026 0.001 0.002 **
rPSC vs non-rPSC , Country 1 1.460 4.927 0.035 0.001 0.001 ***
rPSC vs healthy , Country 1 1.092 4.023 0.036 0.001 0.001 ***
non-rPSC vs healthy , Country 1 1.476 5.300 0.029 0.001 0.001 ***
rPSC vs non-rPSC : Country 1 0.242 0.817 0.006 0.823 0.823
rPSC vs healthy : Country 1 0.342 1.261 0.011 0.118 0.177
non-rPSC vs healthy : Country 1 0.445 1.604 0.009 0.014 0.042 *
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 316.296 1.176 0.008 0.071 0.071
rPSC vs healthy 1 830.867 2.897 0.027 0.001 0.002 **
non-rPSC vs healthy 1 1156.137 4.137 0.023 0.001 0.002 **
rPSC vs non-rPSC , Country 1 639.519 2.378 0.017 0.001 0.001 ***
rPSC vs healthy , Country 1 598.496 2.087 0.019 0.001 0.001 ***
non-rPSC vs healthy , Country 1 752.840 2.694 0.015 0.001 0.001 ***
rPSC vs non-rPSC : Country 1 222.179 0.825 0.006 0.960 0.960
rPSC vs healthy : Country 1 262.783 0.916 0.008 0.746 0.960
non-rPSC vs healthy : Country 1 291.928 1.045 0.006 0.333 0.960
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 0.505 1.557 0.011 0.026 0.026 *
rPSC vs healthy 1 1.342 4.364 0.039 0.001 0.002 **
non-rPSC vs healthy 1 1.979 6.424 0.034 0.001 0.002 **
rPSC vs non-rPSC , Country 1 1.444 4.451 0.032 0.001 0.001 ***
rPSC vs healthy , Country 1 1.059 3.444 0.031 0.001 0.001 ***
non-rPSC vs healthy , Country 1 1.394 4.525 0.024 0.001 0.001 ***
rPSC vs non-rPSC : Country 1 0.333 1.027 0.007 0.386 0.386
rPSC vs healthy : Country 1 0.390 1.272 0.011 0.093 0.140
non-rPSC vs healthy : Country 1 0.447 1.456 0.008 0.036 0.108
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 0.518 1.318 0.009 0.029 0.029 *
rPSC vs healthy 1 1.117 2.934 0.027 0.001 0.002 **
non-rPSC vs healthy 1 1.607 4.216 0.023 0.001 0.002 **
rPSC vs non-rPSC , Country 1 1.214 3.089 0.022 0.001 0.001 ***
rPSC vs healthy , Country 1 0.939 2.467 0.023 0.001 0.001 ***
non-rPSC vs healthy , Country 1 1.203 3.157 0.017 0.001 0.001 ***
rPSC vs non-rPSC : Country 1 0.390 0.992 0.007 0.469 0.469
rPSC vs healthy : Country 1 0.430 1.130 0.010 0.131 0.196
non-rPSC vs healthy : Country 1 0.501 1.318 0.007 0.026 0.078
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 0.504 2.488 0.007 0.267 0.267
rPSC vs healthy 1 2.464 13.675 0.051 0.001 0.002 **
non-rPSC vs healthy 1 3.227 18.295 0.039 0.001 0.002 **
rPSC vs non-rPSC , Country 1 5.037 24.881 0.066 0.001 0.001 ***
rPSC vs healthy , Country 1 2.136 11.852 0.044 0.001 0.001 ***
non-rPSC vs healthy , Country 1 4.670 26.476 0.057 0.001 0.001 ***
rPSC vs non-rPSC : Country 1 0.446 2.209 0.006 0.460 0.460
rPSC vs healthy : Country 1 0.495 2.765 0.010 0.102 0.153
non-rPSC vs healthy : Country 1 1.025 5.881 0.012 0.001 0.003 **
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 0.590 2.035 0.006 0.594 0.594
rPSC vs healthy 1 2.438 9.075 0.035 0.001 0.002 **
non-rPSC vs healthy 1 3.380 12.658 0.028 0.001 0.002 **
rPSC vs non-rPSC , Country 1 4.994 17.225 0.047 0.001 0.001 ***
rPSC vs healthy , Country 1 2.338 8.704 0.033 0.001 0.001 ***
non-rPSC vs healthy , Country 1 4.864 18.215 0.040 0.001 0.001 ***
rPSC vs non-rPSC : Country 1 0.589 2.039 0.006 0.696 0.696
rPSC vs healthy : Country 1 0.632 2.365 0.009 0.208 0.312
non-rPSC vs healthy : Country 1 1.174 4.433 0.010 0.001 0.003 **
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 470.424 1.893 0.005 0.968 0.968
rPSC vs healthy 1 1842.677 6.979 0.027 0.001 0.002 **
non-rPSC vs healthy 1 2475.754 9.500 0.022 0.001 0.002 **
rPSC vs non-rPSC , Country 1 1644.116 6.616 0.019 0.001 0.001 ***
rPSC vs healthy , Country 1 1119.321 4.239 0.017 0.001 0.001 ***
non-rPSC vs healthy , Country 1 1833.401 7.035 0.016 0.001 0.001 ***
rPSC vs non-rPSC : Country 1 470.881 1.900 0.005 0.989 0.989
rPSC vs healthy : Country 1 517.888 1.969 0.008 0.896 0.989
non-rPSC vs healthy : Country 1 611.799 2.355 0.005 0.163 0.489
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 0.737 2.327 0.006 0.415 0.415
rPSC vs healthy 1 3.308 10.870 0.041 0.001 0.002 **
non-rPSC vs healthy 1 4.794 16.082 0.036 0.001 0.002 **
rPSC vs non-rPSC , Country 1 4.600 14.526 0.040 0.001 0.001 ***
rPSC vs healthy , Country 1 2.184 7.177 0.027 0.001 0.001 ***
non-rPSC vs healthy , Country 1 4.330 14.527 0.032 0.001 0.001 ***
rPSC vs non-rPSC : Country 1 0.792 2.512 0.007 0.383 0.383
rPSC vs healthy : Country 1 0.815 2.697 0.010 0.145 0.217
non-rPSC vs healthy : Country 1 1.360 4.601 0.010 0.002 0.006 **
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 0.737 1.909 0.005 0.873 0.873
rPSC vs healthy 1 2.563 6.794 0.026 0.001 0.002 **
non-rPSC vs healthy 1 3.757 10.047 0.023 0.001 0.002 **
rPSC vs non-rPSC , Country 1 3.698 9.578 0.027 0.001 0.001 ***
rPSC vs healthy , Country 1 1.887 5.001 0.019 0.001 0.001 ***
non-rPSC vs healthy , Country 1 3.508 9.381 0.021 0.001 0.001 ***
rPSC vs non-rPSC : Country 1 0.814 2.115 0.006 0.822 0.822
rPSC vs healthy : Country 1 0.853 2.271 0.009 0.388 0.582
non-rPSC vs healthy : Country 1 1.296 3.486 0.008 0.002 0.006 **

PCA

plot_list <- supplements_beta[grepl("PCoA",names(supplements_beta)) &
                              grepl(segment,names(supplements_beta))]

ggarrange(plotlist = plot_list,
          labels=names(plot_list),
          font.label = list(size=5,face="plain"),
          ncol=2,nrow=3)

Univariate analysis

Number of significant taxa

knitr::kable(cbind(as.data.frame(lapply(list_intersections,nrow)),
      as.data.frame(lapply(rpsc_effect,nrow))) %>% t() %>% 
  `colnames<-`("Count") %>% 
  `rownames<-`(c(names(list_intersections),"rPSC effect ASV","rPSC effect Genus","rPSC effect Phylum")),caption="Number of significant taxa")
Number of significant taxa
Count
colon genus non-rPSC vs rPSC 0
colon genus healthy vs rPSC 56
colon genus healthy vs non-rPSC 30
colon ASV non-rPSC vs rPSC 0
colon ASV healthy vs rPSC 98
colon ASV healthy vs non-rPSC 81
colon phylum non-rPSC vs rPSC 1
colon phylum healthy vs rPSC 5
colon phylum healthy vs non-rPSC 4
rPSC effect ASV 31
rPSC effect Genus 47
rPSC effect Phylum 2

Counts

# univar_list <- univariate_statistics(list_intersections,
#                                      psc_effect,
#                                      ileum_genus_asv_taxa_tab)
# 
# univar_df <- univar_list[[1]]
# wb <- univar_list[[2]]
# 
# # save the results
# saveWorkbook(wb,"results/Q1/DAA_final_terminal_ileum.xlsx", overwrite = TRUE)
# 
# # see the results
# univar_df

Machine learning

Main models

Summary

knitr::kable(models_summ_df_colon %>% dplyr::select(
"alpha","lambda",
"auc_optimism_corrected",
"auc_optimism_corrected_CIL",
"auc_optimism_corrected_CIU"),
             digits=3,caption="Elastic net results")
Elastic net results
alpha lambda auc_optimism_corrected auc_optimism_corrected_CIL auc_optimism_corrected_CIU
rPSC vs non-rPSC ASV colon 0.2 0.026 0.689 0.582 0.796
rPSC vs healthy ASV colon 0.8 0.005 0.984 0.964 0.999
non-rPSC vs healthy ASV colon 0.6 0.005 0.956 0.923 0.979
rPSC effect ASV colon 0.8 0.022 0.789 0.705 0.844
rPSC vs non-rPSC genus colon 0.0 0.036 0.616 0.530 0.734
rPSC vs healthy genus colon 0.8 0.000 0.958 0.911 0.997
non-rPSC vs healthy genus colon 0.2 0.003 0.964 0.942 0.987
rPSC effect genus colon 0.6 0.054 0.744 0.708 0.771

ROC - ASV level

roc_curve_all_custom(roc_cs[c(9:12)], 
                     Q="Q2",
                     model_name="enet_model")
[1] "rPSC vs non-rPSC ASV colon"    "rPSC vs healthy ASV colon"    
[3] "non-rPSC vs healthy ASV colon" "rPSC effect ASV colon"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values

ROC - Genus level

roc_curve_all_custom(roc_cs[c(13:16)],Q="Q2",
                     model_name="enet_model")
[1] "rPSC vs non-rPSC genus colon"    "rPSC vs healthy genus colon"    
[3] "non-rPSC vs healthy genus colon" "rPSC effect genus colon"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values

Supplementary models

Summary

# Build final dataframe
models_list[["enet_model"]] <- rbind(models_summ_df_ileum,models_summ_df_colon)
final_df <- tibble(row_names = rownames(models_list[[1]]))

# Loop through models and extract required values
for (model_name in names(models_list)) {
  model_df <- models_list[[model_name]]
  
  # Combine AUC_optimism_corrected with its CI values
  final_df[[model_name]] <- paste0(
    round(model_df$auc_optimism_corrected, 3), 
    " (", round(model_df$auc_optimism_corrected_CIL, 3), "; ", 
    round(model_df$auc_optimism_corrected_CIU, 3), ")"
  )
}

knitr::kable(final_df, caption="All models")
All models
row_names knn_model rf_model gbm_model enet_model_ra knn_model_ra rf_model_ra gbm_model_ra enet_model
rPSC vs non-rPSC ASV terminal_ileum 0.569 (0.377; 0.699) 0.585 (0.498; 0.667) 0.517 (0.402; 0.612) 0.6 (0.5; 0.779) 0.436 (0.344; 0.606) 0.629 (0.544; 0.752) 0.507 (0.406; 0.599) 0.593 (0.477; 0.697)
rPSC vs healthy ASV terminal_ileum 0.89 (0.82; 0.947) 0.921 (0.794; 0.973) 0.918 (0.854; 0.957) 0.953 (0.888; 0.989) 0.738 (0.585; 0.891) 0.94 (0.868; 0.992) 0.95 (0.915; 0.961) 0.957 (0.896; 0.986)
non-rPSC vs healthy ASV terminal_ileum 0.843 (0.763; 0.918) 0.876 (0.809; 0.959) 0.918 (0.879; 0.976) 0.906 (0.829; 0.958) 0.796 (0.732; 0.894) 0.909 (0.861; 0.963) 0.922 (0.862; 0.984) 0.936 (0.864; 0.978)
rPSC effect ASV terminal_ileum 0.637 (0.514; 0.758) 0.551 (0.342; 0.686) 0.609 (0.484; 0.671) 0.569 (0.475; 0.665) 0.601 (0.458; 0.708) 0.635 (0.554; 0.712) 0.639 (0.589; 0.71) 0.655 (0.52; 0.767)
rPSC vs non-rPSC genus terminal_ileum 0.477 (0.367; 0.576) 0.604 (0.455; 0.679) 0.534 (0.437; 0.666) 0.59 (0.453; 0.747) 0.525 (0.45; 0.625) 0.634 (0.561; 0.742) 0.449 (0.361; 0.541) 0.539 (0.426; 0.635)
rPSC vs healthy genus terminal_ileum 0.879 (0.724; 0.973) 0.941 (0.859; 0.982) 0.941 (0.878; 0.985) 0.852 (0.665; 0.943) 0.728 (0.564; 0.872) 0.95 (0.872; 0.994) 0.927 (0.861; 0.987) 0.924 (0.8; 0.982)
non-rPSC vs healthy genus terminal_ileum 0.902 (0.855; 0.949) 0.903 (0.83; 0.964) 0.923 (0.874; 0.966) 0.909 (0.836; 0.96) 0.827 (0.722; 0.889) 0.93 (0.866; 0.984) 0.952 (0.9; 0.995) 0.946 (0.872; 0.982)
rPSC effect genus terminal_ileum 0.581 (0.476; 0.734) 0.599 (0.528; 0.661) 0.608 (0.446; 0.754) 0.536 (0.455; 0.682) 0.546 (0.386; 0.653) 0.616 (0.551; 0.711) 0.615 (0.509; 0.777) 0.673 (0.604; 0.743)
rPSC vs non-rPSC ASV colon 0.619 (0.532; 0.696) 0.647 (0.513; 0.786) 0.615 (0.542; 0.682) 0.613 (0.491; 0.723) 0.444 (0.329; 0.588) 0.665 (0.558; 0.773) 0.624 (0.522; 0.755) 0.689 (0.582; 0.796)
rPSC vs healthy ASV colon 0.934 (0.902; 0.972) 0.944 (0.908; 0.986) 0.956 (0.932; 0.988) 0.961 (0.874; 0.99) 0.784 (0.691; 0.865) 0.971 (0.946; 0.999) 0.971 (0.944; 0.997) 0.984 (0.964; 0.999)
non-rPSC vs healthy ASV colon 0.864 (0.8; 0.905) 0.903 (0.819; 0.967) 0.946 (0.871; 0.981) 0.929 (0.892; 0.959) 0.794 (0.735; 0.878) 0.943 (0.911; 0.969) 0.954 (0.908; 0.974) 0.956 (0.923; 0.979)
rPSC effect ASV colon 0.763 (0.661; 0.813) 0.878 (0.838; 0.943) 0.862 (0.818; 0.928) 0.795 (0.717; 0.859) 0.714 (0.648; 0.789) 0.892 (0.841; 0.956) 0.868 (0.781; 0.941) 0.789 (0.705; 0.844)
rPSC vs non-rPSC genus colon 0.619 (0.517; 0.778) 0.644 (0.528; 0.759) 0.674 (0.592; 0.791) 0.543 (0.425; 0.689) 0.623 (0.5; 0.741) 0.659 (0.516; 0.803) 0.646 (0.575; 0.788) 0.616 (0.53; 0.734)
rPSC vs healthy genus colon 0.911 (0.857; 0.973) 0.961 (0.931; 0.997) 0.974 (0.943; 0.993) 0.925 (0.87; 0.977) 0.756 (0.616; 0.867) 0.967 (0.94; 0.991) 0.982 (0.974; 0.997) 0.958 (0.911; 0.997)
non-rPSC vs healthy genus colon 0.871 (0.81; 0.934) 0.933 (0.909; 0.969) 0.964 (0.95; 0.986) 0.917 (0.877; 0.959) 0.84 (0.771; 0.899) 0.943 (0.916; 0.985) 0.966 (0.941; 0.982) 0.964 (0.942; 0.987)
rPSC effect genus colon 0.79 (0.723; 0.85) 0.859 (0.807; 0.899) 0.824 (0.724; 0.868) 0.673 (0.582; 0.745) 0.641 (0.527; 0.746) 0.85 (0.795; 0.897) 0.824 (0.735; 0.871) 0.744 (0.708; 0.771)
write.csv(final_df,file=file.path(path,"AUC_all_models.csv"),row.names = FALSE)

ROC - ASV

rocs_list <- supplements_models$roc_cs
rocs_list[["enet_model"]] <- roc_cs

plot_list <- list()

for (model_name in names(rocs_list)) {
  plot_list[[model_name]] <- roc_curve_all_custom(rocs_list[[model_name]][c(1:4)],
                       Q="Q2",
                       model_name=model_name)
}
[1] "rPSC vs non-rPSC ASV terminal_ileum"   
[2] "rPSC vs healthy ASV terminal_ileum"    
[3] "non-rPSC vs healthy ASV terminal_ileum"
[4] "rPSC effect ASV terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC ASV terminal_ileum"   
[2] "rPSC vs healthy ASV terminal_ileum"    
[3] "non-rPSC vs healthy ASV terminal_ileum"
[4] "rPSC effect ASV terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC ASV terminal_ileum"   
[2] "rPSC vs healthy ASV terminal_ileum"    
[3] "non-rPSC vs healthy ASV terminal_ileum"
[4] "rPSC effect ASV terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC ASV terminal_ileum"   
[2] "rPSC vs healthy ASV terminal_ileum"    
[3] "non-rPSC vs healthy ASV terminal_ileum"
[4] "rPSC effect ASV terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC ASV terminal_ileum"   
[2] "rPSC vs healthy ASV terminal_ileum"    
[3] "non-rPSC vs healthy ASV terminal_ileum"
[4] "rPSC effect ASV terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC ASV terminal_ileum"   
[2] "rPSC vs healthy ASV terminal_ileum"    
[3] "non-rPSC vs healthy ASV terminal_ileum"
[4] "rPSC effect ASV terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC ASV terminal_ileum"   
[2] "rPSC vs healthy ASV terminal_ileum"    
[3] "non-rPSC vs healthy ASV terminal_ileum"
[4] "rPSC effect ASV terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC ASV terminal_ileum"   
[2] "rPSC vs healthy ASV terminal_ileum"    
[3] "non-rPSC vs healthy ASV terminal_ileum"
[4] "rPSC effect ASV terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
ggarrange(plotlist = plot_list,labels = names(rocs_list),font.label = list(face="plain",size=7))

ROC - genus

plot_list <- list()

for (model_name in names(rocs_list)) {
  plot_list[[model_name]] <- roc_curve_all_custom(rocs_list[[model_name]][c(5:8)],
                       Q="Q2",
                       model_name=model_name)
}
[1] "rPSC vs non-rPSC genus terminal_ileum"   
[2] "rPSC vs healthy genus terminal_ileum"    
[3] "non-rPSC vs healthy genus terminal_ileum"
[4] "rPSC effect genus terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC genus terminal_ileum"   
[2] "rPSC vs healthy genus terminal_ileum"    
[3] "non-rPSC vs healthy genus terminal_ileum"
[4] "rPSC effect genus terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC genus terminal_ileum"   
[2] "rPSC vs healthy genus terminal_ileum"    
[3] "non-rPSC vs healthy genus terminal_ileum"
[4] "rPSC effect genus terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC genus terminal_ileum"   
[2] "rPSC vs healthy genus terminal_ileum"    
[3] "non-rPSC vs healthy genus terminal_ileum"
[4] "rPSC effect genus terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC genus terminal_ileum"   
[2] "rPSC vs healthy genus terminal_ileum"    
[3] "non-rPSC vs healthy genus terminal_ileum"
[4] "rPSC effect genus terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC genus terminal_ileum"   
[2] "rPSC vs healthy genus terminal_ileum"    
[3] "non-rPSC vs healthy genus terminal_ileum"
[4] "rPSC effect genus terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC genus terminal_ileum"   
[2] "rPSC vs healthy genus terminal_ileum"    
[3] "non-rPSC vs healthy genus terminal_ileum"
[4] "rPSC effect genus terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC genus terminal_ileum"   
[2] "rPSC vs healthy genus terminal_ileum"    
[3] "non-rPSC vs healthy genus terminal_ileum"
[4] "rPSC effect genus terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
p <- ggarrange(plotlist = plot_list,labels = names(rocs_list),font.label = list(face="plain",size=7))
pdf("../figures/Q2/models_colon.pdf",
    height =10,width = 10)
p
dev.off()

Paper-ready visualizations

Alpha diversity

p_A <- alpha_div_plots$`terminal_ileum Country` +
  ggtitle("Terminal ileum")+
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15)) 

p_B <-  alpha_div_plots$`colon Country` +
  ggtitle("Colon") +
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15)) 

Q2_alpha <- ggarrange(p_A,ggplot() + theme_minimal(),p_B,nrow=1, ncol=3,
                      widths = c(1,0.1,1))
Q2_alpha

Beta diversity

pca_ti <- pca_plots_list$`terminal_ileum genus custom` 
pca_colon <- pca_plots_list$`colon genus custom` 

genus_Q2_beta <- ggarrange(pca_ti,
                           ggplot() + theme_minimal(),
                           pca_colon,ncol=3,
                           widths = c(1,0.1,1))
genus_Q2_beta

Alpha + Beta diversity

alpha_beta <- ggarrange(Q2_alpha,genus_Q2_beta,
                        ncol = 1,nrow=2,labels = c("A","B"))
alpha_beta

Elastic net

Genus level

models_to_plot <- c("knn_model","rf_model","gbm_model","enet_model")
names(models_to_plot) <- c("kNN","RF","GBoost","ENet")

# ILEUM
plot_list_ileum <- list()
for (model_name in models_to_plot) {
  plot_list_ileum[[model_name]] <- 
    roc_curve_all_custom(rocs_list[[model_name]][c(5:8)],
                       Q="Q2",
                       model_name=model_name,legend = FALSE) + 
    ggtitle(names(models_to_plot)[which(model_name==models_to_plot)]) + 
    theme(plot.title = element_text(face = "bold",size = 8)) 
}
[1] "rPSC vs non-rPSC genus terminal_ileum"   
[2] "rPSC vs healthy genus terminal_ileum"    
[3] "non-rPSC vs healthy genus terminal_ileum"
[4] "rPSC effect genus terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC genus terminal_ileum"   
[2] "rPSC vs healthy genus terminal_ileum"    
[3] "non-rPSC vs healthy genus terminal_ileum"
[4] "rPSC effect genus terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC genus terminal_ileum"   
[2] "rPSC vs healthy genus terminal_ileum"    
[3] "non-rPSC vs healthy genus terminal_ileum"
[4] "rPSC effect genus terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC genus terminal_ileum"   
[2] "rPSC vs healthy genus terminal_ileum"    
[3] "non-rPSC vs healthy genus terminal_ileum"
[4] "rPSC effect genus terminal_ileum"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
roc_curve_ti <- ggarrange(plotlist = plot_list_ileum)

# COLON

plot_list_colon <- list()
for (model_name in models_to_plot) {
  plot_list_colon[[model_name]] <- 
    roc_curve_all_custom(rocs_list[[model_name]][c(13:16)],
                       Q="Q2",
                       model_name=model_name,legend = FALSE) + 
    ggtitle(names(models_to_plot)[which(model_name==models_to_plot)]) + 
    theme(plot.title = element_text(face = "bold",size = 8))  
}
[1] "rPSC vs non-rPSC genus colon"    "rPSC vs healthy genus colon"    
[3] "non-rPSC vs healthy genus colon" "rPSC effect genus colon"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC genus colon"    "rPSC vs healthy genus colon"    
[3] "non-rPSC vs healthy genus colon" "rPSC effect genus colon"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC genus colon"    "rPSC vs healthy genus colon"    
[3] "non-rPSC vs healthy genus colon" "rPSC effect genus colon"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
[1] "rPSC vs non-rPSC genus colon"    "rPSC vs healthy genus colon"    
[3] "non-rPSC vs healthy genus colon" "rPSC effect genus colon"        
Warning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' valuesWarning: collapsing to unique 'x' values
roc_curve_colon <- ggarrange(plotlist = plot_list_colon)


roc_curve_plot <- ggarrange(roc_curve_ti,
                            ggplot() + theme_minimal(),
                            roc_curve_colon,
                            ggplot() + theme_minimal(),
                            ncol=4, widths = c(1,0.1,1,0.1))
roc_curve_plot

alpha_beta_elastic <- ggarrange(Q2_alpha,genus_Q2_beta,roc_curve_plot,
                        ncol = 1,nrow=3,labels = LETTERS,heights = c(1,1,1.2))
alpha_beta_elastic

pdf("../figures/Q2/FIGURE4.pdf",paper = "a4",height = 10,width = 10)
alpha_beta_elastic
dev.off()

Dot heatmap - DAA

p_ileum <- dot_heatmap_ileum + 
  ggtitle("Terminal ileum") +
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15),
        legend.position = "none")


p_colon <- dot_heatmap_colon  +
  ggtitle("Colon") +
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15),
        legend.position = "none")

heatmap_plot <- ggarrange(p_ileum,
                          p_colon,
                          ncol = 2)
heatmap_plot

pdf("../figures/Q2/FIGURE5.pdf",
    height =10,width = 8,paper="a4")
heatmap_plot
dev.off()

Session info

---
title: "PSC article analysis Q2"
output: html_notebook
---

# Custom Functions


```{r}
source("custom_functions.R")
```


# 2. Hypothesis: 2. PSC recurrence is associated with a specific composition of the gut microbiome

**rPSC vs non-rPSC vs Healthy analysis on merged data**

-   Alpha diversity –\> group effect, country effect, interaction effect

-   Beta diversity – PERMANOVA, PCA -\> group effect, country effect,
    interaction effect

-   DAA -\>

    -   Group effect – linDA + MaAsLin2 intersection

    -   Country effect – linDA + MaAsLin2 union

**rPSC vs non-rPSC** –by this comparison, we will find the effect of
recurrence (whether the microbial composition is associated with it)

**non-rPSC vs Healthy** – by this comparison, we will find if the
non-rPSC samples are „closer“ to the healthy samples then rPSC samples
in terms of the microbial composition. In other words, does non-rPSC
samples have „healthy microbiome“?

**rPSC vs Healthy** – by this comparison, we will find how much are the
rPSC samples different from healthy samples.

However, the rPSC and non-rPSC samples are not different at all (see
results). Therefore, we can do EXCLUSIVE LEFT JOIN of differentially
abundant taxa from the analyses (rPSC vs Healthy, non-rPSC vs Healthy)


## Data Import

Importing ASV, taxa and metadata tables for both czech and norway
samples.

# Data Import

Importing ASV, taxa and metadata tables for both Czech and Norway
samples.

**Czech**

```{r}
path = "../../data/analysis_ready_data/ikem/"
asv_tab_ikem <- as.data.frame(fread(file.path(path,"asv_table_ikem.csv"),
                                    check.names = FALSE))
taxa_tab_ikem <- as.data.frame(fread(file.path(path,"taxa_table_ikem.csv"),
                                     check.names = FALSE))
metadata_ikem <- as.data.frame(fread(file.path(path,"metadata_ikem.csv"),
                                     check.names = FALSE))
```

**Norway**

```{r}
path = "../../data/analysis_ready_data/norway/"
asv_tab_norway <- as.data.frame(fread(file.path(path,"asv_table_norway.csv"),
                                    check.names = FALSE))
taxa_tab_norway <- as.data.frame(fread(file.path(path,"taxa_table_norway.csv"),
                                    check.names = FALSE))
metadata_norway <- as.data.frame(fread(file.path(path,"metadata_norway.csv"),
                                    check.names = FALSE))
```

## Merging data

Merging two countries to create whole dataset

```{r}
asv_tab <- merge(asv_tab_ikem,asv_tab_norway,by="SeqID",all=TRUE)
taxa_tab <- merging_taxa_tables(taxa_tab_ikem,taxa_tab_norway)
```

Merging two countries based on the different matrices - Ileum, Colon.

### Terminal ileum

```{r}
ileum_data <- merging_data(asv_tab_1=asv_tab_ikem,
                           asv_tab_2=asv_tab_norway,
                           taxa_tab_1=taxa_tab_ikem,
                           taxa_tab_2=taxa_tab_norway,
                           metadata_1=metadata_ikem,
                           metadata_2=metadata_norway,
                           segment="TI",Q="Q2")

ileum_asv_tab <- ileum_data[[1]]
ileum_taxa_tab <- ileum_data[[2]]
ileum_metadata <- ileum_data[[3]]
```

### Colon

```{r}
colon_data <- merging_data(asv_tab_1=asv_tab_ikem,
                           asv_tab_2=asv_tab_norway,
                           taxa_tab_1=taxa_tab_ikem,
                           taxa_tab_2=taxa_tab_norway,
                           metadata_1=metadata_ikem,
                           metadata_2=metadata_norway,
                           segment="colon",Q="Q2")

colon_asv_tab <- colon_data[[1]]
colon_taxa_tab <- colon_data[[2]]
colon_metadata <- colon_data[[3]]
```

# Data Analysis - Terminal ileum

```{r}
segment="terminal_ileum"
```

## Filtering

Rules: - prevalence \> 5% (per group) - nearZeroVar with default
settings - sequencing depth \> 5000 - taxonomic assignment at least
order

**Rarefaction Curve**

```{r}
path="../intermediate_files/rarecurves"
seq_depth_threshold <- 10000
```

```{r, eval = FALSE}
ps <- construct_phyloseq(ileum_asv_tab,ileum_taxa_tab,ileum_metadata)
rareres <- get_rarecurve(obj=ps, chunks=500)
save(rareres,file = file.path(path,"rarefaction_ileum.Rdata"))
```

```{r}
load(file.path(path,"rarefaction_ileum.Rdata"))

prare <- ggrarecurve(obj=rareres,
                      factorNames="Country",
                      indexNames=c("Observe")) + 
  theme_bw() +
  theme(axis.text=element_text(size=8), panel.grid=element_blank(),
        strip.background = element_rect(colour=NA,fill="grey"),
        strip.text.x = element_text(face="bold")) + 
  geom_vline(xintercept = seq_depth_threshold, 
             linetype="dashed", 
             color = "red") + 
  xlim(0, 20000)

prare
```

**Library size**

```{r, fig.width=5, fig.height=4, fig.fullwidth=TRUE}
read_counts(ileum_asv_tab, line = c(5000,10000))
```

### Sequencing depth

```{r}
data_filt <- seq_depth_filtering(ileum_asv_tab,
                                 ileum_taxa_tab,
                                 ileum_metadata,
                                 seq_depth_threshold = 10000)

filt_ileum_asv_tab <- data_filt[[1]]; alpha_ileum_asv_tab <- filt_ileum_asv_tab
filt_ileum_taxa_tab <- data_filt[[2]]; alpha_ileum_taxa_tab <- filt_ileum_taxa_tab
filt_ileum_metadata <- data_filt[[3]]; alpha_ileum_metadata <- filt_ileum_metadata

seq_step <- dim(filt_ileum_asv_tab)[1]
```

**Library size**

```{r, fig.width=5, fig.height=4, fig.fullwidth=TRUE}
read_counts(filt_ileum_asv_tab,line = c(5000,10000))
```

### NearZeroVar

```{r}
data_filt <- nearzerovar_filtering(filt_ileum_asv_tab, 
                                   filt_ileum_taxa_tab,
                                   filt_ileum_metadata)

filt_ileum_asv_tab <- data_filt[[1]]
filt_ileum_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_ileum_asv_tab)[1]
```

Library size

```{r, fig.width=5, fig.height=4, fig.fullwidth=TRUE}
read_counts(filt_ileum_asv_tab,line = c(5000,10000))
```

### Final Counts

```{r}
final_counts_filtering(ileum_asv_tab,
                       filt_ileum_asv_tab,
                       filt_ileum_metadata,
                       seq_step, 0, nearzero_step) %>% `colnames<-`("Count")
```

## Alpha diversity

```{r}
path = "../results/Q2/alpha_diversity"
```

**Calculation**

```{r}
# Construct MPSE object
alpha_ileum_metadata$Sample <- alpha_ileum_metadata$SampleID
ileum_mpse <- as.MPSE(construct_phyloseq(alpha_ileum_asv_tab,
                                         alpha_ileum_taxa_tab,
                                         alpha_ileum_metadata))

ileum_mpse %<>% mp_rrarefy(raresize = 10000,seed = 123)

# Calculate alpha diversity - rarefied counts
ileum_mpse %<>% mp_cal_alpha(.abundance=RareAbundance, force=TRUE)
```

```{r}

alpha_div_plots <- list()

# preparing data frame
alpha_data <- data.frame(SampleID=ileum_mpse$Sample.x,
                         Observe=ileum_mpse$Observe,
                         Shannon=ileum_mpse$Shannon,
                         Simpson=ileum_mpse$Simpson,
                         Pielou=ileum_mpse$Pielou,
                         Group=ileum_mpse$Group,
                         Country=ileum_mpse$Country,
                         Patient=ileum_mpse$Patient)

write.csv(alpha_data,file.path(path,paste0("alpha_indices_",segment,".csv")),
          row.names = FALSE)
```

**Plots**

### Country plot

```{r, fig.width=10, fig.height=4, fig.fullwidth=TRUE}
p_boxplot_alpha <- alpha_diversity_countries(alpha_data)

# save the results
alpha_div_plots[[paste(segment,"Country")]] <- p_boxplot_alpha

# see the results
p_boxplot_alpha
```

```{r,results='hide'}
pdf("../figures/Q2/alpha_diversity_terminal_ileum.pdf",
    height =4,width = 7)
p_boxplot_alpha
dev.off()
```

### Custom plot

```{r, fig.width=5, fig.height=4, fig.fullwidth=TRUE}
alpha_data <- alpha_data %>% 
  dplyr::select(-c("Simpson","Pielou")) %>%
  mutate(Richness=Observe)

p_A <- alpha_diversity_custom_2(alpha_data,
                                size = 1.5,
                                width = 0.3)

# save the results
alpha_div_plots[[paste(segment,"Custom")]] <- p_A

p_A
```

### Linear Model

```{r}
path = "../results/Q2/alpha_diversity"
alpha_data <- read.csv(file.path(path,paste0("alpha_indices_",segment,".csv")))
```

**Richness**

```{r}
results_model <- pairwise.lm(formula = "Observe ~ Group * Country",
                             factors=alpha_data$Group,
                             data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_observe <- results_model[[1]]
  results_model_observe_emeans <- results_model[[2]]
} else {
  results_model_observe <- results_model
  results_model_observe_emeans <- NA
}

# save the results
pc_observed <- list(); 
pc_observed[[segment]] <- results_model_observe
```

```{r}
# see the results
knitr::kable(results_model_observe,digits = 3,
caption = "Raw results of linear model of richness estimation.")

knitr::kable(results_model_observe_emeans,digits = 3,
caption = "Raw results of independent country analysis")
```

**Shannon**

```{r}
results_model <- pairwise.lm(formula = "Shannon ~ Group * Country",
                             factors=alpha_data$Group,
                             data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_shannon <- results_model[[1]]
  results_model_shannon_emeans <- results_model[[2]]
} else {
  results_model_shannon <- results_model
  results_model_shannon_emeans <- NA
}

# save the results
pc_shannon <- list(); 
pc_shannon[[segment]] <- as.data.frame(results_model_shannon)

```

```{r}
# see the results
knitr::kable(results_model_shannon,digits = 3,
caption = "Raw results of linear model of Shannon estimation.")

knitr::kable(results_model_shannon_emeans,digits = 3,
caption = "Raw results of independent country analysis")
```

**Simpson**

```{r}
results_model <- pairwise.lm(formula = "Simpson ~ Group * Country",
                                     factors=alpha_data$Group,
                                     data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_simpson <- results_model[[1]]
  results_model_simpson_emeans <- results_model[[2]]
} else {
  results_model_simpson <- results_model
  results_model_simpson_emeans <- NA
}


# save the results
pc_simpson <- list(); 
pc_simpson[[segment]] <- as.data.frame(results_model_simpson)
```

```{r}
# see the results
knitr::kable(results_model_simpson,digits = 3,
caption = "Raw results of linear model of Simpson estimation.")

knitr::kable(results_model_simpson_emeans,digits = 3,
caption = "Raw results of independent country analysis")
```

**Pielou**

```{r}
results_model <- pairwise.lm(formula = "Pielou ~ Group * Country",
                                     factors=alpha_data$Group,
                                     data=alpha_data)

# check interaction

if (!is.data.frame(results_model)){
  results_model_pielou <- results_model[[1]]
  results_model_pielou_emeans <- results_model[[2]]
} else {
  results_model_pielou <- results_model
  results_model_pielou_emeans <- NA
}

# save the results
pc_pielou <- list(); 
pc_pielou[[segment]] <- as.data.frame(results_model_pielou)
```

```{r}
# see the results
knitr::kable(results_model_pielou,digits = 3,
caption = "Raw results of linear model of Pielou estimation.")

knitr::kable(results_model_pielou_emeans,digits = 3,
caption = "Raw results of independent country analysis")
```
### Saving results

```{r}
alpha_list <- list(
  Richness=pc_observed[[segment]] %>% rownames_to_column("Comparison"),
  Shannon=pc_shannon[[segment]] %>% rownames_to_column("Comparison"),
  Simpson=pc_simpson[[segment]] %>% rownames_to_column("Comparison"),
  Pielou=pc_pielou[[segment]] %>% rownames_to_column("Comparison"))
                   
write.xlsx(alpha_list, 
           file = file.path(path,paste0("alpha_diversity_results_",segment,".xlsx")))
```

## Beta diversity

Calculating Aitchison distance (euclidean distance on clr-transformed
data), both at ASV and genus level.

### Main analysis - Genus, Aitchison

**Genus level, Aitchison distance**

```{r}
level="genus"
```

```{r}
path = "../results/Q2/beta_diversity"
```

```{r}
pairwise_aitchison_raw <- list()
pca_plots_list <- list()
```

Aggregation, filtering

```{r}
# Aggregation
genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level=level,
                             names=TRUE)
# Filtration
filt_data <- filtering_steps(genus_data[[1]],
                             genus_data[[2]],
                             ileum_metadata,
                             seq_depth_threshold=10000)

filt_ileum_genus_tab <- filt_data[[1]]
filt_ileum_genus_taxa <- filt_data[[2]]
filt_ileum_metadata <- filt_data[[3]]
```

##### PERMANOVA

```{r}
pairwise_df <- filt_ileum_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_ileum_metadata$Group,
                           covariate = filt_ileum_metadata$Country, sim.method = "robust.aitchison", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, interaction = TRUE, sim.method = "robust.aitchison", p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
pairwise_aitchison_raw[[paste(level, segment)]] <- rbind(pp_factor,pp_cov,pp_fac.cov)
```

```{r}
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
```

Interaction check

```{r}
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

if (length(interaction_sig)>0){
  for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_ileum_metadata$Group,
                      covariate = filt_ileum_metadata$Country, 
                      group1 = group1,
                      group2 = group2)
  print(result_list)
}
}


```

##### Plots

**Custom**

```{r}
p <- pca_plot_custom(filt_ileum_genus_tab,
                                 filt_ileum_genus_taxa,
                                 filt_ileum_metadata,
                                 show_boxplots = TRUE,
                                 variable = "Group", size=3, show_legend=FALSE)

# save the results
pca_plots_list[[paste(segment,level,"custom")]] <- p

# see the results
p
```
```{r,results='hide'}
pdf("../figures/Q2/beta_diversity_terminal_ileum.pdf",
    height =5,width = 5)
p
dev.off()
```

#### Saving results

```{r}
write.xlsx(pairwise_aitchison_raw[[paste(level, segment)]], 
           file = file.path(path,
           paste0("beta_diversity_results_", segment,".xlsx")))
```

### Supplementary analysis

```{r}
supplements_beta <- list()
```

#### Genus level

```{r}
level="genus"
```

##### Bray-Curtis

**PERMANOVA**

```{r}
pairwise_df <- filt_ileum_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, sim.method = "bray", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, interaction = TRUE, sim.method = "bray", p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("bray",level,segment)]] <- rbind(pp_factor,pp_cov,pp_fac.cov)
```

```{r}
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
```

Interaction check

```{r}
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_ileum_metadata$Group,
                      covariate = filt_ileum_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      sim.method = 'bray')
  print(result_list)
}

```

**Plots**

```{r}
p <- pca_plot_custom(filt_ileum_genus_tab,
                                 filt_ileum_genus_taxa,
                                 filt_ileum_metadata,
                                 measure = "bray",
                                 show_boxplots = TRUE,
                                 variable = "Group", size=3, show_legend=FALSE)

# save the results
supplements_beta[[paste("PCoA bray",level,segment)]] <- p

# see the results
p
```

##### Jaccard

**PERMANOVA**

```{r}
pairwise_df <- filt_ileum_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, sim.method = "jaccard", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, interaction = TRUE, sim.method = "jaccard", p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("jaccard",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)
```

```{r}
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
```

Interaction check

```{r}
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_ileum_metadata$Group,
                      covariate = filt_ileum_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      sim.method = 'jaccard')
  print(result_list)
}

```

**Plots**

*Custom*

```{r}
p <- pca_plot_custom(filt_ileum_genus_tab,
                                 filt_ileum_genus_taxa,
                                 filt_ileum_metadata,
                                 measure = "jaccard",
                                 show_boxplots = TRUE,
                                 variable = "Group", size=3, show_legend=FALSE)

# save the results
supplements_beta[[paste("PCoA jaccard",level,segment)]] <- p

# see the results
p
```

#### ASV level

```{r}
level="ASV"
```

##### Aitchison 

*PERMANOVA*

```{r}
# preparing data frame
pairwise_df <- filt_ileum_asv_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, sim.method = "robust.aitchison", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, interaction = TRUE, sim.method = "robust.aitchison", p.adjust.m="BH")

pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("aitchison",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)
```

```{r}
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
```

Interaction check

```{r}
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

if (length(interaction_sig)>0){
for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_ileum_metadata$Group,
                      covariate = filt_ileum_metadata$Country, 
                      group1 = group1,
                      group2 = group2)
  print(result_list)
}
}
```

**PCoA**

```{r}
p <- pca_plot_custom(filt_ileum_asv_tab,
                           filt_ileum_taxa_tab,
                           filt_ileum_metadata,
                           show_boxplots = TRUE,
                           variable = "Group", size=3, show_legend=FALSE)

# save the results
supplements_beta[[paste("PCoA aitchison",level,segment)]] <- p

# see the results
p
```

##### Bray-Curtis 

*PERMANOVA*

```{r}
# preparing data frame
pairwise_df <- filt_ileum_asv_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, sim.method = "bray", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, interaction = TRUE, sim.method = "bray", p.adjust.m="BH")

pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("bray",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)

```

```{r}
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
```

Interaction check

```{r}
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

if (length(interaction_sig)>0){
for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_ileum_metadata$Group,
                      covariate = filt_ileum_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      sim.method = 'bray')
  print(result_list)
}
}
```

**PCoA**

```{r}
p <- pca_plot_custom(filt_ileum_asv_tab,
                     filt_ileum_taxa_tab,
                     filt_ileum_metadata,
                     measure = "bray",
                     show_boxplots = TRUE,
                     variable = "Group", size=3, show_legend=FALSE)

# save the results
supplements_beta[[paste("PCoA bray",level,segment)]] <- p

# see the results
p
```

##### Jaccard

*PERMANOVA*

```{r}
# preparing data frame
pairwise_df <- filt_ileum_asv_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, sim.method = "jaccard", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, interaction = TRUE, sim.method = "jaccard", p.adjust.m="BH")

pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("jaccard",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)
```

```{r}
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
```

Interaction check

```{r}
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

if (length(interaction_sig)>0){
for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_ileum_metadata$Group,
                      covariate = filt_ileum_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      sim.method = 'jaccard')
  print(result_list)
}
}
```

**PCoA**

```{r}
p <- pca_plot_custom(filt_ileum_asv_tab,
                     filt_ileum_taxa_tab,
                     filt_ileum_metadata,
                     measure = "jaccard",
                     show_boxplots = TRUE,
                     variable = "Group", size=3, show_legend=FALSE)

# save the results
supplements_beta[[paste("PCoA jaccard",level,segment)]] <- p

# see the results
p
```

#### Saving results

```{r}
write.xlsx(supplements_beta[!grepl("PCoA",names(supplements_beta))],
           file = file.path(path,
           paste0("supplements_beta_diversity_", segment,".xlsx")))
```

## Univariate Analysis

### Main - Genus level

```{r}
level="genus"
```

```{r}
# needed paths
path = "../results/Q2/univariate_analysis"
path_maaslin=file.path("../intermediate_files/maaslin/Q2",level)
```

```{r}
# variables
raw_linda_results_genus <- list();
raw_linda_results_genus[[segment]] <- list()
linda_results_genus <- list(); 
linda_results_genus[[segment]] <- list()

# country and interaction problems
list_country_union <- list()
list_intersections <- list()
list_venns <- list()
uni_statistics <- list()

# workbook for final df
wb <- createWorkbook()

# rPSC effect
rpsc_effect <- list()
```

Aggregate taxa

```{r}
genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

ileum_genus_asv_taxa_tab <- create_asv_taxa_table(ileum_genus_tab,
                                                  ileum_genus_taxa_tab)
```

#### rPSC vs non-rPSC

```{r}
group <- c("non-rPSC","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

##### linDA

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}

# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
                            ileum_genus_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")

filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}

```

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("Country effect") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

```

##### MaAsLin2

```{r, echo=FALSE,results='hide',message=FALSE,warning=FALSE}
fit_data = Maaslin2(
    input_data = filt_ileum_uni_data, 
    input_metadata = filt_ileum_uni_metadata, min_abundance = 0,
    min_prevalence = 0,min_variance = 0,
    output = file.path(path_maaslin,group1), max_significance = 0.05,
    fixed_effects = c('Group', 'Country'),correction = "BH")

```

```{r}
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano
```

##### Group - Intersection

```{r}
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn
```

##### Country - Union

```{r,eval=FALSE}
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
```

##### Interaction effect

```{r}
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

## results for czech cohort
list_interaction_significant[[2]]

## results for norwegian cohort
list_interaction_significant[[3]]
```

Removing problematic taxa

```{r}
list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
```

##### Basic statistics

```{r}
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)
```

#### rPSC vs healthy

```{r}
group <- c("healthy","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

##### linDA

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}

# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
                            ileum_genus_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")

filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}

# summary statistics
# raw_linda_results_genus <- binomial_statistics(filt_ileum_uni_data,             
#                                             group=group,
#                                             filt_ileum_uni_metadata,
#                                             raw_linda_results_genus,
#                                             segment = "terminal_ileum")

```

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("Country effect") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

```

##### MaAsLin2

```{r, echo=FALSE,results='hide',message=FALSE,warning=FALSE}
fit_data = Maaslin2(
    input_data = filt_ileum_uni_data, 
    input_metadata = filt_ileum_uni_metadata, min_abundance = 0,
    min_prevalence = 0,min_variance = 0,
    output = file.path(path_maaslin,group1), max_significance = 0.05,
    fixed_effects = c('Group', 'Country'),correction = "BH")

```

```{r}
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano
```

##### Group - Intersection

```{r}
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn
```

##### Country - Union

```{r,eval=FALSE}
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
```

##### Interaction effect

```{r}
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

## results for czech cohort
list_interaction_significant[[2]]

## results for norwegian cohort
list_interaction_significant[[3]]
```

Removing problematic taxa

```{r}
list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
```

##### Basic statistics

```{r}
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)
```

#### non-rPSC vs healthy

```{r}
group <- c("healthy","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

##### linDA

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}

# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
                            ileum_genus_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")

filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}

# summary statistics
# raw_linda_results_genus <- binomial_statistics(filt_ileum_uni_data,             
#                                             group=group,
#                                             filt_ileum_uni_metadata,
#                                             raw_linda_results_genus,
#                                             segment = "terminal_ileum")

```

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("Country effect") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

```

##### MaAsLin2

```{r, echo=FALSE,results='hide',message=FALSE,warning=FALSE}
fit_data = Maaslin2(
    input_data = filt_ileum_uni_data, 
    input_metadata = filt_ileum_uni_metadata, min_abundance = 0,
    min_prevalence = 0,min_variance = 0,
    output = file.path(path_maaslin,group1), max_significance = 0.05,
    fixed_effects = c('Group', 'Country'),correction = "BH")

```

```{r}
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano
```

##### Group - Intersection

```{r}
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn
```

##### Country - Union

```{r,eval=FALSE}
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
```

##### Interaction effect

```{r}
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

## results for czech cohort
list_interaction_significant[[2]]

## results for norwegian cohort
list_interaction_significant[[3]]
```

Removing problematic taxa

```{r}
list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
```

##### Basic statistics

```{r}
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)
```

#### Visualization

Heatmap visualizing the linDA's logFoldChange for taxa with p \< 0.1.

```{r, fig.width=10, fig.height=17, fig.fullwidth=TRUE}
list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,ileum_taxa_tab)
p_heatmap_linda
```

Dot heatmap

```{r, fig.width=5, fig.height=10, fig.fullwidth=TRUE}
dotheatmap_linda <- dot_heatmap_linda(
  list_heatmap,                               uni_statistics$terminal_ileum[grepl(level,names(uni_statistics$terminal_ileum))],
                                      ileum_taxa_tab) + xlab("") + ylab("")
dotheatmap_linda
```

**Horizontal bar plot**

```{r}
p_prevalence <- horizontal_barplot(wb,taxa=levels(dotheatmap_linda$data$SeqID))
```


```{r}
p_prevalence_final <- ggarrange(p_prevalence,
                                ggplot() + theme_minimal(),
                                nrow = 2,heights = c(1,0.085))
p <- ggarrange(dotheatmap_linda + theme(legend.position = "none"),p_prevalence_final,ncol=2,widths = c(1,0.3))
p

dot_heatmap_ileum <- p
```

```{r,results='hide'}
pdf("../figures/Q2/dotplot_terminal_ileum.pdf",
    height =10,width = 4)
p
dev.off()
```


#### rPSC effect

**pre_LTx vs Healthy and Post_LTx vs Healthy intersection**


```{r}
A <- list_intersections[[paste(segment,level,"healthy vs rPSC")]]
B <- list_intersections[[paste(segment,level,"healthy vs non-rPSC")]]
df <- A[!(A$SeqID %in% B$SeqID),]


rpsc_effect[[paste(segment,level)]] <- df
  
# see the results
rpsc_effect[[paste(segment,level)]] 
```

#### Saving results

```{r}
# ALL DATA
saveWorkbook(wb,file.path(path,paste0("uni_analysis_wb_",segment,".xlsx")),
             overwrite = TRUE)

# PSC effect
write.xlsx(rpsc_effect[[paste(segment,level)]],file.path(path,paste0("rpsc_effect_",segment,".xlsx")))

# SIGNIFICANT taxa

write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
            `names<-`(gsub(segment, "", names(
              list_intersections[grepl(segment,names(list_intersections))]))),
           file.path(path,paste0("significant_taxa_",segment,".xlsx")))
```


### Supplementary Analysis

```{r}
supplements_uni <- list()
supplements_wb <- createWorkbook()
```

#### ASV level

```{r}
level="ASV"
```

```{r}
path_maaslin="../intermediate_files/maaslin/Q2/ASV/"
```

```{r}
raw_linda_results <- list();
raw_linda_results[[segment]] <- list()
linda_results <- list(); 
linda_results[[segment]] <- list()
```

##### rPSC vs non-rPSC

```{r}
group <- c("non-rPSC","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

**linDA**

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}

# prepare the data
linda_data <- binomial_prep(ileum_asv_tab,
                            ileum_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")

filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}

```

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("Country effect") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

```

 MaAsLin2

```{r, echo=FALSE,results='hide',message=FALSE,warning=FALSE}
fit_data = Maaslin2(
    input_data = filt_ileum_uni_data, 
    input_metadata = filt_ileum_uni_metadata, min_abundance = 0,
    min_prevalence = 0,min_variance = 0,
    output = file.path(path_maaslin,group1), max_significance = 0.05,
    fixed_effects = c('Group', 'Country'),correction = "BH")

```

```{r}
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano
```

 Group - Intersection

```{r}
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn
```

 Country - Union

```{r,eval=FALSE}
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
```

 Interaction effect

```{r}
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

## results for czech cohort
list_interaction_significant[[2]]

## results for norwegian cohort
list_interaction_significant[[3]]
```

Removing problematic taxa

```{r}
list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
```

 Basic statistics

```{r}
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
```

##### rPSC vs healthy

```{r}
group <- c("healthy","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

 linDA

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}

# prepare the data
linda_data <- binomial_prep(ileum_asv_tab,
                            ileum_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")

filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}

# summary statistics
# raw_linda_results_genus <- binomial_statistics(filt_ileum_uni_data,             
#                                             group=group,
#                                             filt_ileum_uni_metadata,
#                                             raw_linda_results_genus,
#                                             segment = "terminal_ileum")

```

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("Country effect") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

```

 MaAsLin2

```{r, echo=FALSE,results='hide',message=FALSE,warning=FALSE}
fit_data = Maaslin2(
    input_data = filt_ileum_uni_data, 
    input_metadata = filt_ileum_uni_metadata, min_abundance = 0,
    min_prevalence = 0,min_variance = 0,
    output = file.path(path_maaslin,group1), max_significance = 0.05,
    fixed_effects = c('Group', 'Country'),correction = "BH")

```

```{r}
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano
```

 Group - Intersection

```{r}
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn
```

 Country - Union

```{r,eval=FALSE}
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
```

 Interaction effect

```{r}
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

## results for czech cohort
list_interaction_significant[[2]]

## results for norwegian cohort
list_interaction_significant[[3]]
```

Removing problematic taxa

```{r}
list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
```

 Basic statistics

```{r}
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
```

##### non-rPSC vs healthy

```{r}
group <- c("healthy","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

 linDA

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}

# prepare the data
linda_data <- binomial_prep(ileum_asv_tab,
                            ileum_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")

filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}

# summary statistics
# raw_linda_results_genus <- binomial_statistics(filt_ileum_uni_data,             
#                                             group=group,
#                                             filt_ileum_uni_metadata,
#                                             raw_linda_results_genus,
#                                             segment = "terminal_ileum")

```

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("Country effect") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

```

 MaAsLin2

```{r, echo=FALSE,results='hide',message=FALSE,warning=FALSE}
fit_data = Maaslin2(
    input_data = filt_ileum_uni_data, 
    input_metadata = filt_ileum_uni_metadata, min_abundance = 0,
    min_prevalence = 0,min_variance = 0,
    output = file.path(path_maaslin,group1), max_significance = 0.05,
    fixed_effects = c('Group', 'Country'),correction = "BH")

```

```{r}
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano
```

 Group - Intersection

```{r}
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn
```

 Country - Union

```{r,eval=FALSE}
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
```

 Interaction effect

```{r}
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

## results for czech cohort
list_interaction_significant[[2]]

## results for norwegian cohort
list_interaction_significant[[3]]
```

Removing problematic taxa

```{r}
list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
```

 Basic statistics

```{r}
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
```

##### Visualization

Heatmap visualizing the linDA's logFoldChange for taxa with p \< 0.1.

```{r, fig.width=10, fig.height=17, fig.fullwidth=TRUE}
list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,ileum_taxa_tab)
p_heatmap_linda
```

Dot heatmap

```{r, fig.width=5, fig.height=17, fig.fullwidth=TRUE}
dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
                                      uni_statistics$terminal_ileum[grepl(level,names(uni_statistics$terminal_ileum))],
                                      ileum_taxa_tab)
dotheatmap_linda
```

##### rPSC effect

**pre_LTx vs Healthy and Post_LTx vs Healthy intersection**


```{r}
A <- list_intersections[[paste(segment,level,"healthy vs rPSC")]]
B <- list_intersections[[paste(segment,level,"healthy vs non-rPSC")]]
df <- A[!(A$SeqID %in% B$SeqID),]


rpsc_effect[[paste(segment,level)]] <- df
  
# see the results
rpsc_effect[[paste(segment,level)]] 
```


#### Phylum level

```{r}
level="phylum"
```

```{r}
path_maaslin="../intermediate_files/maaslin/Q2/Phylum/"
```

```{r}
raw_linda_results_phylum <- list();
raw_linda_results_phylum[[segment]] <- list()
linda_results_phylum <- list(); 
linda_results_phylum[[segment]] <- list()
```

Aggregate taxa

```{r}
phylum_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = "Phylum")

ileum_phylum_tab <- phylum_data[[1]]
ileum_phylum_taxa_tab <- phylum_data[[2]]

```

##### rPSC vs non-rPSC

```{r}
group <- c("non-rPSC","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

**linDA**

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}

# prepare the data
linda_data <- binomial_prep(ileum_phylum_tab,
                            ileum_phylum_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")

filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_phylum[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_phylum[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}

```

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("Country effect") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

```

 MaAsLin2

```{r, echo=FALSE,results='hide',message=FALSE,warning=FALSE}
fit_data = Maaslin2(
    input_data = filt_ileum_uni_data, 
    input_metadata = filt_ileum_uni_metadata, min_abundance = 0,
    min_prevalence = 0,min_variance = 0,
    output = file.path(path_maaslin,group1), max_significance = 0.05,
    fixed_effects = c('Group', 'Country'),correction = "BH")

```

```{r}
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano
```

Group - Intersection

```{r}
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_phylum, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn
```

 Country - Union

```{r,eval=FALSE}
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
```

 Interaction effect

```{r}
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

## results for czech cohort
list_interaction_significant[[2]]

## results for norwegian cohort
list_interaction_significant[[3]]
```

Removing problematic taxa

```{r}
list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
```

 Basic statistics

```{r}
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_phylum[[segment]][[group1]],
                 by="SeqID",all=TRUE)

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
```

##### rPSC vs healthy

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

 linDA

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}

# prepare the data
linda_data <- binomial_prep(ileum_phylum_tab,
                            ileum_phylum_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")

filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_phylum[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_phylum[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}

# summary statistics
# raw_linda_results_genus <- binomial_statistics(filt_ileum_uni_data,             
#                                             group=group,
#                                             filt_ileum_uni_metadata,
#                                             raw_linda_results_genus,
#                                             segment = "terminal_ileum")

```

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("Country effect") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano
```

 MaAsLin2

```{r, echo=FALSE,results='hide',message=FALSE,warning=FALSE}
fit_data = Maaslin2(
    input_data = filt_ileum_uni_data, 
    input_metadata = filt_ileum_uni_metadata, min_abundance = 0,
    min_prevalence = 0,min_variance = 0,
    output = file.path(path_maaslin,group1), max_significance = 0.05,
    fixed_effects = c('Group', 'Country'),correction = "BH")

```

```{r}
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano
```

 Group - Intersection

```{r}
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_phylum, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn
```

 Country - Union

```{r,eval=FALSE}
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
```

 Interaction effect

```{r}
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

## results for czech cohort
list_interaction_significant[[2]]

## results for norwegian cohort
list_interaction_significant[[3]]
```

Removing problematic taxa

```{r}
list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
```

 Basic statistics

```{r}
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_phylum[[segment]][[group1]],
                 by="SeqID",all=TRUE)

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
```

##### non-rPSC vs healthy

```{r}
group <- c("healthy","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

 linDA

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}

# prepare the data
linda_data <- binomial_prep(ileum_phylum_tab,
                            ileum_phylum_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")

filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_phylum[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_phylum[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}

# summary statistics
# raw_linda_results_genus <- binomial_statistics(filt_ileum_uni_data,             
#                                             group=group,
#                                             filt_ileum_uni_metadata,
#                                             raw_linda_results_genus,
#                                             segment = "terminal_ileum")

```

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("Country effect") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

```

 MaAsLin2

```{r, echo=FALSE,results='hide',message=FALSE,warning=FALSE}
fit_data = Maaslin2(
    input_data = filt_ileum_uni_data, 
    input_metadata = filt_ileum_uni_metadata, min_abundance = 0,
    min_prevalence = 0,min_variance = 0,
    output = file.path(path_maaslin,group1), max_significance = 0.05,
    fixed_effects = c('Group', 'Country'),correction = "BH")

```

```{r}
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_name)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano
```

 Group - Intersection

```{r}
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_phylum, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn
```

 Country - Union

```{r,eval=FALSE}
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
```

 Interaction effect

```{r}
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

## results for czech cohort
list_interaction_significant[[2]]

## results for norwegian cohort
list_interaction_significant[[3]]
```

Removing problematic taxa

```{r}
list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
```

 Basic statistics

```{r}
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_phylum[[segment]][[group1]],
                 by="SeqID",all=TRUE)

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
```

##### Visualization

Heatmap visualizing the linDA's logFoldChange for taxa with p \< 0.1.

```{r, fig.width=10, fig.height=17, fig.fullwidth=TRUE}
list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,ileum_taxa_tab)
p_heatmap_linda
```

Dot heatmap

```{r, fig.width=5, fig.height=17, fig.fullwidth=TRUE}
dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
                                      uni_statistics$terminal_ileum[grepl(level,names(uni_statistics$terminal_ileum))],
                                      ileum_taxa_tab)
dotheatmap_linda
```

##### rPSC effect

**pre_LTx vs Healthy and Post_LTx vs Healthy intersection**


```{r}
A <- list_intersections[[paste(segment,level,"healthy vs rPSC")]]
B <- list_intersections[[paste(segment,level,"healthy vs non-rPSC")]]
df <- A[!(A$SeqID %in% B$SeqID),]

rpsc_effect[[paste(segment,level)]] <- df
  
# see the results
rpsc_effect[[paste(segment,level)]] 
```

#### Saving results

```{r}
# ALL DATA
saveWorkbook(supplements_wb,file.path(path,paste0("supplements_uni_analysis_wb_",segment,".xlsx")),overwrite = TRUE)

# PSC effect
write.xlsx(rpsc_effect,
          file.path(path,paste0("supplements_rpsc_effect_",segment,".xlsx")))

# SIGNIFICANT taxa
write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
            `names<-`(gsub(segment, "", names(
              list_intersections[grepl(segment,names(list_intersections))]))),
           file.path(path,paste0("supplements_significant_taxa_",segment,".xlsx")))
```


## Machine learning

```{r}
path = "../results/Q2/models"
```

### ElasticNet

```{r}
model="enet"
```

#### ASV level

```{r}
level="ASV"
```

##### rPSC vs non-rPSC

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ <- list()
models_cm <- list()
betas <- list()
roc_cs <- list()

models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
enet_model$conf_matrices
enet_model$plot
roc_c
```

##### rPSC vs healthy

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
enet_model$conf_matrices
enet_model$plot

roc_c
```

##### non-rPSC vs healthy

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
enet_model$conf_matrices
enet_model$plot

roc_c
```

##### rPSC effect

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_asv_tab,
                                                ileum_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
enet_model$conf_matrices
enet_model$plot

roc_c
```

#### Genus level

```{r}
level="genus"
```

Aggregate taxa

```{r}
genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]
```

##### rPSC vs non-rPSC

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
enet_model$conf_matrices
enet_model$plot

roc_c
```

##### rPSC vs healthy

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
enet_model$conf_matrices
enet_model$plot

roc_c
```

##### non-rPSC vs healthy

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,group, 
                                     usage="ml_clr")

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
enet_model$conf_matrices
enet_model$plot
roc_c
```

##### rPSC effect

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_genus_tab,
                                                ileum_genus_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
enet_model$conf_matrices
enet_model$plot

roc_c
```

#### Saving results

```{r}
models_summ_df_ileum <- do.call(rbind, 
  models_summ[grep(segment,names(models_summ),value = TRUE)])

write.csv(models_summ_df_ileum,file.path(path,paste0("elastic_net_",segment,".csv")))
```


### Supplementary models

```{r}
supplements_models <- list()
```

#### CLR-transformed data

##### kNN

```{r}
model="knn"
```

###### ASV level

```{r}
level="ASV"
```

***rPSC vs non-rPSC***

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")

# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)


# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")

# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs


# see the results
knn_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")

# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_asv_tab,
                                                ileum_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")

# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

###### Genus level

```{r}
level="genus"
```

Aggregate taxa

```{r}
genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]
```

**rPSC vs non-rPSC**

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")

# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs


# see the results
knn_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")

# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs


# see the results
knn_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")

# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs


# see the results
knn_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_genus_tab,
                                                ileum_genus_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")

# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

##### Random Forest

```{r}
model="rf"
```

###### ASV level

```{r}
level="ASV"
```

***rPSC vs non-rPSC***

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")

# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")

# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")

# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_asv_tab,
                                                ileum_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")

# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
roc_c
```

###### Genus level

```{r}
level="genus"
```

Aggregate taxa

```{r}
genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]
```

**rPSC vs non-rPSC**

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")

# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")

# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")

# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_genus_tab,
                                                ileum_genus_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")

# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
roc_c
```

##### Gradient boosting

```{r}
model="gb"
```

###### ASV level

```{r}
level="ASV"
```

***rPSC vs non-rPSC***

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")

# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")

# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")

# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_asv_tab,
                                                ileum_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")

# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
roc_c
```

###### Genus level

```{r}
level="genus"
```

Aggregate taxa

```{r}
genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]
```

**rPSC vs non-rPSC**

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")

# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")

# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")

# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_genus_tab,
                                                ileum_genus_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")

# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
roc_c
```

#### Relative abundances

##### Elastic net

###### ASV level

```{r}
level="ASV"
```

***rPSC vs non-rPSC***

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)


# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_asv_tab,
                                                ileum_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
roc_c
```

###### Genus level

```{r}
level="genus"
```

Aggregate taxa

```{r}
genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]
```

**rPSC vs non-rPSC**

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)


# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_genus_tab,
                                                ileum_genus_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
roc_c
```

##### kNN

###### ASV level

```{r}
level="ASV"
```

***rPSC vs non-rPSC***

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")

# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")

# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")

# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_asv_tab,
                                                ileum_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")

# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

###### Genus level

```{r}
level="genus"
```

Aggregate taxa

```{r}
genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]
```

**rPSC vs non-rPSC**

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")

# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")

# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")

# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_genus_tab,
                                                ileum_genus_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")

# fit the model
knn_model <- knn_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

##### Random Forest

###### ASV level

```{r}
level="ASV"
```

***rPSC vs non-rPSC***

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")

# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")

# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")

# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_asv_tab,
                                                ileum_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")

# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
roc_c
```

###### Genus level

```{r}
level="genus"
```

Aggregate taxa

```{r}
genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]
```

**rPSC vs non-rPSC**

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")

# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")

# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")

# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_genus_tab,
                                                ileum_genus_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")

# fit the model
rf_model <- rf_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
roc_c
```

##### Gradient boosting

###### ASV level

```{r}
level="ASV"
```

***rPSC vs non-rPSC***

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")

# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")

# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_ra")

# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_asv_tab,
                                                ileum_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")

# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
roc_c
```

###### Genus level

```{r}
level="genus"
```

Aggregate taxa

```{r}
genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]
```

**rPSC vs non-rPSC**

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")

# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")

# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_ra")

# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_genus_tab,
                                                ileum_genus_taxa_tab, 
                                                ileum_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")

# fit the model
gbm_model <- gbm_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
roc_c
```

#### Saving results

```{r}
models_list <- list()

for (model_name in names(supplements_models$models_summ)){
  df <- do.call(rbind, supplements_models$models_summ[[model_name]])
  models_list[[model_name]] <- df
}

write.xlsx(models_list,
           file=file.path(path,paste0("supplements_models_",segment,".xlsx")),
           rowNames=TRUE)

```


## Results overview

#### Alpha diversity

```{r}
pc_observed[[segment]]
pc_shannon[[segment]]
pc_simpson[[segment]]
pc_pielou[[segment]]
```

Plots

```{r, fig.width=10, fig.height=4, fig.fullwidth=TRUE}
alpha_div_plots[[paste(segment,"Country")]]
```

```{r, fig.width=7, fig.height=4, fig.fullwidth=TRUE}
alpha_div_plots[[paste(segment,"Custom")]]
```

#### Beta diversity

**Main results**

```{r}
pairwise_aitchison_raw[[paste("genus", segment)]]
```

PCA

```{r, fig.width=5, fig.height=3, fig.fullwidth=TRUE, eval=TRUE}
pca_plots_list[[paste(segment,"genus custom")]]
```

**Supplements**

```{r}
knitr::kable(supplements_beta[!grepl("PCoA",names(supplements_beta))],
             digits = 3,
             caption = "Supplementary PERMANOVA results")
```

PCA

```{r, fig.width=8, fig.height=7, fig.fullwidth=TRUE, eval=FALSE}
ggarrange(plotlist = supplements_beta[grepl("PCoA",names(supplements_beta))],
          labels=names(supplements_beta[grepl("PCoA",names(supplements_beta))]),
          font.label = list(size=5,face="plain"),
          ncol=2,nrow=3)
```

#### Univariate analysis

**Number of significant taxa**

```{r}
knitr::kable(cbind(as.data.frame(lapply(list_intersections,nrow)),
      as.data.frame(lapply(rpsc_effect,nrow))) %>% t() %>% 
  `colnames<-`("Count") %>% 
  `rownames<-`(c(names(list_intersections),"PSC effect ASV","PSC effect Genus")),caption="Number of significant taxa")
```

**Counts**

```{r, eval=FALSE}
# univar_list <- univariate_statistics(list_intersections,
#                                      psc_effect,
#                                      ileum_genus_asv_taxa_tab)
# 
# univar_df <- univar_list[[1]]
# wb <- univar_list[[2]]
# 
# # save the results
# saveWorkbook(wb,"results/Q1/DAA_final_terminal_ileum.xlsx", overwrite = TRUE)
# 
# # see the results
# univar_df
```

#### Machine learning

**Main models**

Summary

```{r}
knitr::kable(models_summ_df_ileum %>% dplyr::select(
"alpha","lambda",
"auc_optimism_corrected",
"auc_optimism_corrected_CIL",
"auc_optimism_corrected_CIU"),
             digits=2,caption="Elastic net results")
```

**ROC - ASV level**

```{r}
roc_curve_all_custom(roc_cs[c(1:4)], 
                     Q="Q2",
                     model_name="enet_model")
```

**ROC - Genus level**

```{r}
roc_curve_all_custom(roc_cs[c(5:8)],Q="Q2",
                     model_name="enet_model")
```

**Supplementary models**

Summary

```{r}
# Build final dataframe
models_list[["enet_model"]] <- models_summ_df_ileum
final_df <- tibble(row_names = rownames(models_list[[1]]))

# Loop through models and extract required values
for (model_name in names(models_list)) {
  model_df <- models_list[[model_name]]
  
  # Combine AUC_optimism_corrected with its CI values
  final_df[[model_name]] <- paste0(
    round(model_df$auc_optimism_corrected, 3), 
    " (", round(model_df$auc_optimism_corrected_CIL, 3), "; ", 
    round(model_df$auc_optimism_corrected_CIU, 3), ")"
  )
}

knitr::kable(final_df, caption="All models")
```

ROC - ASV

```{r, fig.width=10, fig.height=8, fig.fullwidth=TRUE}
rocs_list <- supplements_models$roc_cs
rocs_list[["enet_model"]] <- roc_cs

plot_list <- list()

for (model_name in names(rocs_list)) {
  plot_list[[model_name]] <- roc_curve_all_custom(rocs_list[[model_name]][c(1:4)],
                       Q="Q2",
                       model_name=model_name)
}

ggarrange(plotlist = plot_list,labels = names(rocs_list),font.label = list(face="plain",size=7))
```

ROC - genus

```{r, fig.width=10, fig.height=8, fig.fullwidth=TRUE}
plot_list <- list()

for (model_name in names(rocs_list)) {
  plot_list[[model_name]] <- roc_curve_all_custom(rocs_list[[model_name]][c(5:8)],
                       Q="Q2",
                       model_name=model_name)
}

p <- ggarrange(plotlist = plot_list,labels = names(rocs_list),font.label = list(face="plain",size=7))
p
```

```{r,results='hide'}
pdf("../figures/Q2/models_ileum.pdf",
    height =10,width = 10)
p
dev.off()
```

# Analysis - Colon

```{r}
segment="colon"
```

## Filtering

Rules: - prevalence \> 5% (per group) - nearZeroVar with default
settings - sequencing depth \> 5000 - taxonomic assignment at least
order

**Rarefaction Curve**

```{r}
path="../intermediate_files/rarecurves"
seq_depth_threshold <- 10000
```

```{r, eval = FALSE}
ps <- construct_phyloseq(colon_asv_tab,colon_taxa_tab,colon_metadata)
rareres <- get_rarecurve(obj=ps, chunks=500)
save(rareres,file = file.path(path,"rarefaction_colon.Rdata"))
```

```{r}
load(file.path(path,"rarefaction_colon.Rdata"))
seq_depth_threshold <- 10000
prare <- ggrarecurve(obj=rareres,
                      factorNames="Country",
                      indexNames=c("Observe")) + 
        theme_bw()+
        theme(axis.text=element_text(size=8), 
              panel.grid=element_blank(),
              strip.background = element_rect(colour=NA,fill="grey"),
              strip.text.x = element_text(face="bold")) + 
        geom_vline(xintercept = seq_depth_threshold, 
                   linetype="dashed", color = "red") + 
        xlim(0, 20000)

prare
```

Library size

```{r, fig.width=5, fig.height=4, fig.fullwidth=TRUE}
read_counts(colon_asv_tab, line = c(5000,10000))
```

### Sequencing depth

```{r}
data_filt <- seq_depth_filtering(colon_asv_tab,
                                 colon_taxa_tab,
                                 colon_metadata,
                                 seq_depth_threshold = 10000)

filt_colon_asv_tab <- data_filt[[1]]; alpha_colon_asv_tab <- filt_colon_asv_tab
filt_colon_taxa_tab <- data_filt[[2]]; alpha_colon_taxa_tab <- filt_colon_taxa_tab
filt_colon_metadata <- data_filt[[3]]; alpha_colon_metadata <- filt_colon_metadata

seq_step <- dim(filt_colon_asv_tab)[1]
```

Library size

```{r, fig.width=5, fig.height=4, fig.fullwidth=TRUE}
read_counts(filt_colon_asv_tab,line = c(10000))
```

### NearZeroVar

```{r}
data_filt <- nearzerovar_filtering(filt_colon_asv_tab,
                                   filt_colon_taxa_tab,
                                   filt_colon_metadata)

filt_colon_asv_tab <- data_filt[[1]]
filt_colon_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_colon_asv_tab)[1]
```

Library size

```{r, fig.width=5, fig.height=4, fig.fullwidth=TRUE}
read_counts(filt_colon_asv_tab,line = c(5000,10000))
```

Check zero depth

```{r}
data_filt <- check_zero_depth(filt_colon_asv_tab, 
                              filt_colon_taxa_tab, 
                              filt_colon_metadata)

filt_colon_asv_tab <- data_filt[[1]]; 
filt_colon_taxa_tab <- data_filt[[2]]; 
filt_colon_metadata <- data_filt[[3]]; 
```

Library size

```{r, fig.width=5, fig.height=4, fig.fullwidth=TRUE}
read_counts(filt_colon_asv_tab,line = c(5000,10000))
```

### Final Counts

```{r}
final_counts_filtering(colon_asv_tab,
                       filt_colon_asv_tab,
                       filt_colon_metadata,
                       seq_step, 0, nearzero_step)
```

## Alpha diversity

```{r}
path = "../results/Q2/alpha_diversity"
```

**Calculation**

```{r}
# Construct MPSE object
alpha_colon_metadata$Sample <- alpha_colon_metadata$SampleID
colon_mpse <- as.MPSE(construct_phyloseq(alpha_colon_asv_tab,
                                         alpha_colon_taxa_tab,
                                         alpha_colon_metadata))

colon_mpse %<>% mp_rrarefy(raresize = 10000,seed = 123)

# Calculate alpha diversity - rarefied counts
colon_mpse %<>% mp_cal_alpha(.abundance=RareAbundance, force=TRUE)
```

```{r}
alpha_data <- data.frame(SampleID=colon_mpse$Sample.x,
                         Observe=colon_mpse$Observe,
                         Shannon=colon_mpse$Shannon,
                         Simpson=colon_mpse$Simpson,
                         Pielou=colon_mpse$Pielou,
                         Group=colon_mpse$Group,
                         Country=colon_mpse$Country,
                         Patient=colon_mpse$Patient)

write.csv(alpha_data,file.path(path,paste0("alpha_indices_",segment,".csv")),
          row.names = FALSE)
```

**Plots**

### Country plot

```{r, fig.width=10, fig.height=4, fig.fullwidth=TRUE}
p_boxplot_alpha <- alpha_diversity_countries(alpha_data)

# save the results
alpha_div_plots[[paste(segment,"Country")]] <- p_boxplot_alpha

# see the results
p_boxplot_alpha
```

```{r,results='hide'}
pdf("../figures/Q2/alpha_diversity_colon.pdf",
    height =4,width = 7)
p_boxplot_alpha
dev.off()
```

### Custom plot

```{r}
alpha_data <- alpha_data %>% 
  dplyr::select(-c("Simpson","Pielou")) %>%
  mutate(Richness=Observe)

p_B <- alpha_diversity_custom_2(alpha_data,
                                size = 1.5,
                                width = 0.3)

# save the results
alpha_div_plots[[paste(segment,"Custom")]] <- p_B

p_B
```

### Linear Model

```{r}
path = "../results/Q2/alpha_diversity"
alpha_data <- read.csv(file.path(path,paste0("alpha_indices_",segment,".csv")))
```

**Richness**

```{r}
results_model <- pairwise.lmer(
  formula = "Observe ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_observe <- results_model[[1]]
  results_model_observe_detailed <- results_model[[2]]
} else {
  results_model_observe <- results_model
  results_model_observe_detailed <- NA
}

# save the results
pc_observed[[segment]] <- results_model_observe
```

```{r}
# see the results
knitr::kable(results_model_observe,digits = 3,
caption = "Raw results of linear model of richness estimation.")

knitr::kable(results_model_observe_detailed,digits = 3,
caption = "Raw results of independent country analysis")
```

**Shannon**

```{r}
results_model <- pairwise.lmer(
  formula = "Shannon ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_shannon <- results_model[[1]]
  results_model_shannon_detailed <- results_model[[2]]
} else {
  results_model_shannon <- results_model
  results_model_shannon_detailed <- NA
}

# save the results
pc_shannon[[segment]] <- as.data.frame(results_model_shannon)

```

```{r}
# see the results
knitr::kable(results_model_shannon,digits = 3,
caption = "Raw results of linear model of Shannon estimation.")

knitr::kable(results_model_shannon_detailed,digits = 3,
caption = "Raw results of independent country analysis")
```

**Simpson**

```{r}
results_model <- pairwise.lmer(
  formula = "Simpson ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_simpson <- results_model[[1]]
  results_model_simpson_detailed <- results_model[[2]]
} else {
  results_model_simpson <- results_model
  results_model_simpson_detailed <- NA
}

# save the results
pc_simpson[[segment]] <- as.data.frame(results_model_simpson)

```

```{r}
# see the results
knitr::kable(results_model_simpson,digits = 3,
caption = "Raw results of linear model of Simpson estimation.")

knitr::kable(results_model_simpson_detailed,digits = 3,
caption = "Raw results of independent country analysis")
```

**Pielou**

```{r}
results_model <- pairwise.lmer(
  formula = "Pielou ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_pielou <- results_model[[1]]
  results_model_pielou_detailed <- results_model[[2]]
} else {
  results_model_pielou <- results_model
  results_model_pielou_detailed <- NA
}

# save the results
pc_pielou[[segment]] <- as.data.frame(results_model_pielou)

```

```{r}
# see the results
knitr::kable(results_model_pielou,digits = 3,
caption = "Raw results of linear model of Pielou estimation.")

knitr::kable(results_model_pielou_detailed,digits = 3,
caption = "Raw results of independent country analysis")
```

### Saving results

```{r}
alpha_list <- list(
  Richness=pc_observed[[segment]] %>% rownames_to_column("Comparison"),
  Shannon=pc_shannon[[segment]] %>% rownames_to_column("Comparison"),
  Simpson=pc_simpson[[segment]] %>% rownames_to_column("Comparison"),
  Pielou=pc_pielou[[segment]] %>% rownames_to_column("Comparison"))
                   
write.xlsx(alpha_list, 
           file = file.path(path,paste0("alpha_diversity_results_",segment,".xlsx")))
```

## Beta diversity

Calculating Aitchison distance (euclidean distance on clr-transformed
data), both at ASV and genus level.

### Main analysis - Genus, Aitchison

**Genus level, Aitchison distance**

```{r}
level="genus"
```

```{r}
path = "../results/Q2/beta_diversity"
```

Aggregation, filtering

```{r}
genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level=level,
                             names=TRUE)

filt_data <- filtering_steps(genus_data[[1]],
                             genus_data[[2]],
                             colon_metadata,
                            seq_depth_threshold=10000)

filt_colon_genus_tab <- filt_data[[1]]
filt_colon_genus_taxa <- filt_data[[2]]
filt_colon_genus_metadata <- filt_data[[3]]
```

##### PERMANOVA

```{r}
pairwise_df <- filt_colon_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_colon_genus_metadata$Group,
                           covariate = filt_colon_genus_metadata$Country, 
                           patients = filt_colon_genus_metadata$Patient,
                           sim.method = "robust.aitchison", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_colon_genus_metadata$Group,
                          covariate = filt_colon_genus_metadata$Country, 
                          interaction = TRUE, 
                          patients = filt_colon_genus_metadata$Patient,
                          sim.method = "robust.aitchison", p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
pairwise_aitchison_raw[[paste(level, segment)]] <-rbind(pp_factor,pp_cov,pp_fac.cov)

```

```{r}
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
```

Interaction check

```{r}
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

if (length(interaction_sig)>0){
 for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_colon_genus_metadata$Group,
                      covariate = filt_colon_genus_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      patients = filt_colon_genus_metadata$Patient)
  print(result_list)
} 
}

```

##### Plots

**PCoA custom**

```{r}
p <- pca_plot_custom(filt_colon_genus_tab,
                                 filt_colon_genus_taxa,
                                 filt_colon_genus_metadata,
                                 show_boxplots = TRUE,
                                 variable = "Group", size=2, 
                                 show_legend=FALSE)

# save the results
pca_plots_list[[paste(segment,level,"custom")]] <- p

# see the results
p

```

```{r,results='hide'}
pdf("../figures/Q2/beta_diversity_colon.pdf",
    height =5,width = 5)
p
dev.off()
```

#### Saving results

```{r}
write.xlsx(pairwise_aitchison_raw[[paste(level, segment)]], 
           file = file.path(path,
           paste0("beta_diversity_results_", segment,".xlsx")))
```

### Supplementary analysis

#### Genus level

```{r}
level="genus"
```

##### Bray-Curtis

**PERMANOVA**

```{r}
pairwise_df <- filt_colon_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_colon_genus_metadata$Group,
                           covariate = filt_colon_genus_metadata$Country, 
                           patients = filt_colon_genus_metadata$Patient,
                           sim.method = "bray", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,
                          filt_colon_genus_metadata$Group,
                          covariate = filt_colon_genus_metadata$Country, 
                          patients = filt_colon_genus_metadata$Patient,
                          interaction = TRUE, sim.method = "bray", p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("bray",level,segment)]] <- rbind(pp_factor,pp_cov,pp_fac.cov)
```

```{r}
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
```

Interaction check

```{r}
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_colon_genus_metadata$Group,
                      covariate = filt_colon_genus_metadata$Country, 
                      patients = filt_colon_genus_metadata$Patient,
                      group1 = group1,
                      group2 = group2,
                      sim.method = 'bray')
  print(result_list)
}

```

**Plots**

```{r}
p <- pca_plot_custom(filt_colon_genus_tab,
                                 filt_colon_genus_taxa,
                                 filt_colon_genus_metadata,
                                 measure = "bray",
                                 show_boxplots = TRUE,
                                 variable = "Group", size=3, show_legend=FALSE)

# save the results
supplements_beta[[paste("PCoA bray",level,segment)]] <- p

# see the results
p
```

##### Jaccard

**PERMANOVA**

```{r}
pairwise_df <- filt_colon_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_colon_genus_metadata$Group,
                           covariate = filt_colon_genus_metadata$Country,
                           patients = filt_colon_genus_metadata$Patient,
                           sim.method = "jaccard", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,
                          filt_colon_genus_metadata$Group,
                          covariate = filt_colon_genus_metadata$Country,
                          patients = filt_colon_genus_metadata$Patient,
                          interaction = TRUE, sim.method = "jaccard", p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("jaccard",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)
```

```{r}
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
```

Interaction check

```{r}
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_colon_genus_metadata$Group,
                      covariate = filt_colon_genus_metadata$Country, 
                      patients = filt_colon_genus_metadata$Patient,
                      group1 = group1,
                      group2 = group2,
                      sim.method = 'jaccard')
  print(result_list)
}

```

**Plots**

*Custom*

```{r}
p <- pca_plot_custom(filt_colon_genus_tab,
                                 filt_colon_genus_taxa,
                                 filt_colon_genus_metadata,
                                 measure = "jaccard",
                                 show_boxplots = TRUE,
                                 variable = "Group", size=3, show_legend=FALSE)

# save the results
supplements_beta[[paste("PCoA jaccard",level,segment)]] <- p

# see the results
p
```

#### ASV level

```{r}
level="ASV"
```

##### Aitchison

```{r}
# preparing data frame
pairwise_df <- filt_colon_asv_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(x=pairwise_df,
                          filt_colon_metadata$Group,
                           covariate = filt_colon_metadata$Country, 
                           sim.method = "robust.aitchison", 
                           p.adjust.m="BH",
                           patients = filt_colon_metadata$Patient)

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_colon_metadata$Group,
                          covariate = filt_colon_metadata$Country, 
                          interaction = TRUE, 
                          sim.method = "robust.aitchison", 
                          p.adjust.m="BH",
                          patients = filt_colon_metadata$Patient)

pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("aitchison",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)

# see the results
pp_factor
pp_cov
pp_fac.cov

```

```{r}
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
```

Interaction check

```{r}
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

if (length(interaction_sig)>0){
 for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_colon_metadata$Group,
                      covariate = filt_colon_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      patients = filt_colon_metadata$Patient)
  print(result_list)
} 
}
```

**PCoA**

```{r}
p <- pca_plot_custom(filt_colon_asv_tab,
                           filt_colon_taxa_tab,
                           filt_colon_metadata,
                           show_boxplots = TRUE,
                           variable = "Group", 
                           size=3, 
                           show_legend=FALSE)

# save the results
supplements_beta[[paste("PCoA aitchison",level,segment)]] <- p

# see the results
p
```

##### Bray-Curtis 

*PERMANOVA*

```{r}
# preparing data frame
pairwise_df <- filt_colon_asv_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_colon_metadata$Group,
                           covariate = filt_colon_metadata$Country,
                           patients = filt_colon_metadata$Patient,
                           sim.method = "bray", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,
                          filt_colon_metadata$Group,
                          covariate = filt_colon_metadata$Country, 
                          patients = filt_colon_metadata$Patient,
                          interaction = TRUE, sim.method = "bray", p.adjust.m="BH")

pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("bray",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)

```

```{r}
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
```

Interaction check

```{r}
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_colon_metadata$Group,
                      covariate = filt_colon_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      patients = filt_colon_metadata$Patient,
                      sim.method = 'bray')
  print(result_list)
}

```

**PCoA**

```{r}
p <- pca_plot_custom(filt_colon_asv_tab,
                     filt_colon_taxa_tab,
                     filt_colon_metadata,
                     measure = "bray",
                     show_boxplots = TRUE,
                     variable = "Group", size=3, show_legend=FALSE)

# save the results
supplements_beta[[paste("PCoA bray",level,segment)]] <- p

# see the results
p
```

##### Jaccard

*PERMANOVA*

```{r}
# preparing data frame
pairwise_df <- filt_colon_asv_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_colon_metadata$Group,
                           covariate = filt_colon_metadata$Country,
                           patients = filt_colon_metadata$Patient,
                           sim.method = "jaccard", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,
                          filt_colon_metadata$Group,
                          covariate = filt_colon_metadata$Country, 
                          patients = filt_colon_metadata$Patient,
                          interaction = TRUE, sim.method = "jaccard", p.adjust.m="BH")

pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
supplements_beta[[paste("jaccard",level,segment)]] <- rbind(pp_factor, 
                                                            pp_cov, 
                                                            pp_fac.cov)
```

```{r}
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
```

Interaction check

```{r}
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_colon_metadata$Group,
                      covariate = filt_colon_metadata$Country, 
                      patients = filt_colon_metadata$Patient,
                      group1 = group1,
                      group2 = group2,
                      sim.method = 'jaccard')
  print(result_list)
}

```

**PCoA**

```{r}
p <- pca_plot_custom(filt_colon_asv_tab,
                     filt_colon_taxa_tab,
                     filt_colon_metadata,
                     measure = "jaccard",
                     show_boxplots = TRUE,
                     variable = "Group", size=3, show_legend=FALSE)

# save the results
supplements_beta[[paste("PCoA jaccard",level,segment)]] <- p

# see the results
p
```

#### Saving results

```{r}
write.xlsx(supplements_beta[!grepl("PCoA",names(supplements_beta))],
           file = file.path(path,
           paste0("supplements_beta_diversity_", segment,".xlsx")))
```

## Univariate Analysis

### Main - Genus level

```{r}
level="genus"
```

```{r}
# needed paths
path = "../results/Q2/univariate_analysis"
path_maaslin=file.path("../intermediate_files/maaslin/Q2",level)
```

```{r}
# variables
raw_linda_results_genus[[segment]] <- list()
linda_results_genus[[segment]] <- list()

# country and interaction problems
list_country_union <- list()
list_intersections <- list()
list_venns <- list()
uni_statistics <- list()

# workbook for final df
wb <- createWorkbook()

# PSC effect
rpsc_effect <- list()
```

#### Genus level

```{r}
level="genus"
```

Aggregate taxa

```{r}
genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

colon_genus_asv_taxa_tab <- create_asv_taxa_table(colon_genus_tab,
                                                  colon_genus_taxa_tab)
```

##### rPSC vs non-rPSC

###### linDA

```{r}
group <- c("non-rPSC","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}

# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
                            colon_genus_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")

filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data,
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")


for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
```

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle("Country effect") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) +
            ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano
```

###### MaAsLin2

```{r, echo=FALSE,results='hide',message=FALSE,warning=FALSE}
fit_data = Maaslin2(
    input_data = filt_colon_uni_data, 
    input_metadata = filt_colon_uni_metadata, min_abundance = 0,
    min_prevalence = 0,min_variance = 0,
    output = file.path(path_maaslin,group1), max_significance = 0.05,
    fixed_effects = c('Group', 'Country'),random_effects = "Patient",
    correction = "BH")

```

```{r}
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano
```

###### Group - Intersection

```{r}
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn
```

###### Country - Union

```{r,eval=FALSE}
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
```

###### Interaction effect

```{r}
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

## results for czech cohort
list_interaction_significant[[2]]

## results for norwegian cohort
list_interaction_significant[[3]]
```

Removing problematic taxa

```{r}
list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
```

##### Basic statistics

```{r}
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)
```

#### rPSC vs healthy

```{r}
group <- c("healthy","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

##### linDA

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}

# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
                            colon_genus_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")

filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data,
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}

```

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle("Country effect") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) +
            ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano
```

###### MaAsLin2

```{r, echo=FALSE,results='hide',message=FALSE,warning=FALSE}
fit_data = Maaslin2(
    input_data = filt_colon_uni_data, 
    input_metadata = filt_colon_uni_metadata, min_abundance = 0,
    min_prevalence = 0,min_variance = 0,
    output = file.path(path_maaslin,group1), max_significance = 0.05,
    fixed_effects = c('Group', 'Country'),random_effects = "Patient",
    correction = "BH")

```

```{r}
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano
```

###### Group - Intersection

```{r}
intersection_results <- group_intersection(group, 
                                           list_intersections, 
                                           list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn
```

###### Country - Union

```{r,eval=FALSE}
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
```

###### Interaction effect

```{r}
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

## results for czech cohort
list_interaction_significant[[2]]

## results for norwegian cohort
list_interaction_significant[[3]]
```

Removing problematic taxa

```{r}
list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
```

##### Basic statistics

```{r}
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)
```

##### non-rPSC vs healthy

```{r}
group <- c("healthy","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

###### linDA

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}

# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
                            colon_genus_taxa_tab,
                            colon_metadata,group,
                            usage="linDA")

filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data,
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}

```

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle("Country effect") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) +
            ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano
```

###### MaAsLin2

```{r, echo=FALSE,results='hide',message=FALSE,warning=FALSE}
fit_data = Maaslin2(
    input_data = filt_colon_uni_data, 
    input_metadata = filt_colon_uni_metadata, min_abundance = 0,
    min_prevalence = 0,min_variance = 0,
    output = file.path(path_maaslin,group1), max_significance = 0.05,
    fixed_effects = c('Group', 'Country'),random_effects = "Patient",
    correction = "BH")


```

```{r}
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano
```

###### Group - Intersection

```{r}
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn
```

###### Country - Union

```{r, eval=FALSE}
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
```

###### Interaction effect

```{r}
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

## results for czech cohort
list_interaction_significant[[2]]

## results for norwegian cohort
list_interaction_significant[[3]]
```

Removing problematic taxa

```{r}
list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
```

##### Basic statistics

```{r}
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)
```

##### Visualization

Heatmap visualizing the linDA's logFoldChange for taxa with p \< 0.1.

```{r, fig.width=10, fig.height=17, fig.fullwidth=TRUE}
list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,colon_taxa_tab)
p_heatmap_linda
```

Dot heatmap

```{r, fig.width=5, fig.height=17, fig.fullwidth=TRUE}
dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
                                      uni_statistics$colon[grepl(level,names(uni_statistics$colon))],
                                      colon_taxa_tab) + xlab("") + ylab("")
dotheatmap_linda
```

**Horizontal bar plot**

```{r}
p_prevalence <- horizontal_barplot(wb,taxa=levels(dotheatmap_linda$data$SeqID))
```


```{r}
p_prevalence_final <- ggarrange(p_prevalence,
                                ggplot() + theme_minimal(),
                                nrow = 2,heights = c(1,0.085))
p <- ggarrange(dotheatmap_linda + theme(legend.position = "none"),p_prevalence_final,ncol=2,widths = c(1,0.3))
p

dot_heatmap_colon <- p
```

```{r,results='hide'}
pdf("../figures/Q2/dotplot_colon.pdf",
    height =10,width = 4)
p
dev.off()
```


##### rPSC effect

**pre_LTx vs Healthy and Post_LTx vs Healthy intersection**


```{r}
A <- list_intersections[[paste(segment,level,"healthy vs rPSC")]]
B <- list_intersections[[paste(segment,level,"healthy vs non-rPSC")]]
df <- A[!(A$SeqID %in% B$SeqID),]

rpsc_effect[[paste(segment,level)]] <- df
  
# see the results
rpsc_effect[[paste(segment,level)]] 
```

#### Saving results

```{r}
# ALL DATA
saveWorkbook(wb,file.path(path,paste0("uni_analysis_wb_",segment,".xlsx")),
             overwrite = TRUE)

# PSC effect
write.xlsx(rpsc_effect[[paste(segment,level)]],file.path(path,paste0("rpsc_effect_",segment,".xlsx")))

# SIGNIFICANT taxa

write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
            `names<-`(gsub(segment, "", names(
              list_intersections[grepl(segment,names(list_intersections))]))),
           file.path(path,paste0("significant_taxa_",segment,".xlsx")))
```

### Supplementary Analysis

#### ASV level

```{r}
level="ASV"
```

```{r}
path_maaslin="../intermediate_files/maaslin/Q2/ASV/"
```

```{r}
raw_linda_results[[segment]] <- list()
linda_results[[segment]] <- list()
supplements_wb <- createWorkbook()
```

##### rPSC vs non-rPSC

```{r}
group <- c("non-rPSC","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

**linDA**

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}

# prepare the data
linda_data <- binomial_prep(colon_asv_tab,
                            colon_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")

filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data, 
                   filt_colon_uni_metadata, 
                   formula = '~ Group * Country + (1|Patient)')

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}

```

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
                                taxa_table = filt_colon_uni_taxa) +
              ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                 taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Country effect")

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano
```

**MaAsLin2**

```{r, echo=FALSE,results='hide',message=FALSE,warning=FALSE}
fit_data = Maaslin2(
    input_data = filt_colon_uni_data, 
    input_metadata = filt_colon_uni_metadata, min_abundance = 0,
    min_prevalence = 0,min_variance = 0,
    output = file.path(path_maaslin,group1), max_significance = 0.05,
    fixed_effects = c('Group', 'Country'),random_effects = "Patient",
    correction = "BH")

```

Volcano plot

```{r}
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)

# see the results
volcano

```

**Group - Intersection**

```{r}
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                                                                      segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn
```

**Country - Union**

```{r, eval=FALSE}
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
```

**Interaction effect**

```{r}
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                                    filt_colon_uni_data,
                                                    filt_colon_uni_metadata,
                                                    segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

## results for czech cohort
list_interaction_significant[[2]]

## results for norwegian cohort
list_interaction_significant[[3]]
```

Removing problematic taxa

```{r}
list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
```

Basic statistics

```{r}
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
```

##### rPSC vs healthy

```{r}
group <- c("healthy","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

**linDA**

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}

# prepare the data
linda_data <- binomial_prep(colon_asv_tab,colon_taxa_tab,
                            colon_metadata,group, usage="linDA")

filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data, 
                   filt_colon_uni_metadata, 
                   formula = '~ Group * Country + (1|Patient)')

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
```

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
                                taxa_table = filt_colon_uni_taxa) +
              ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                 taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Country effect")

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano
```

**MaAsLin2**

```{r, echo=FALSE,results='hide',message=FALSE,warning=FALSE}
fit_data = Maaslin2(
    input_data = filt_colon_uni_data, 
    input_metadata = filt_colon_uni_metadata, min_abundance = 0,
    min_prevalence = 0,min_variance = 0,
    output = file.path(path_maaslin,group1), max_significance = 0.05,
    fixed_effects = c('Group', 'Country'),random_effects = "Patient",
    correction = "BH")

```

```{r}
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)

# see the results
volcano
```

**Group - Intersection**

```{r}
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                           segment=segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn
```

**Country - Union**

```{r, eval=FALSE}
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
```

**Interaction effect**

```{r}
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

## results for czech cohort
list_interaction_significant[[2]]

## results for norwegian cohort
list_interaction_significant[[3]]
```

Removing problematic taxa

```{r}
list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                               segment=segment,
                                                    level=level)
```

**Basic statistics**

```{r}
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
```

##### non-rPSC vs healthy

```{r}
group <- c("healthy","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

**linDA**

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}

# prepare the data
linda_data <- binomial_prep(colon_asv_tab,
                            colon_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")

filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data, 
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
```

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
                                taxa_table = filt_colon_uni_taxa) +
              ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                 taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Country effect")

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano
```

**MaAsLin2**

```{r, echo=FALSE,results='hide',message=FALSE,warning=FALSE}
fit_data = Maaslin2(
    input_data = filt_colon_uni_data, 
    input_metadata = filt_colon_uni_metadata, min_abundance = 0,
    min_prevalence = 0,min_variance = 0,
    output = file.path(path_maaslin,group1), max_significance = 0.05,
    fixed_effects = c('Group', 'Country'),random_effects = "Patient",
    correction = "BH")
```

```{r}
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano
```

**Group - Intersection**

```{r}
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn
```

**Country - Union**

```{r, eval=FALSE}
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
```

**Interaction effect**

```{r}
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

## results for czech cohort
list_interaction_significant[[2]]

## results for norwegian cohort
list_interaction_significant[[3]]
```

Removing problematic taxa

```{r}
list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
```

 Basic statistics

```{r}
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
```

##### Visualization

Heatmap visualizing the linDA's logFoldChange for taxa with p \< 0.1.

```{r, fig.width=10, fig.height=17, fig.fullwidth=TRUE}
list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,colon_taxa_tab)
p_heatmap_linda
```

**Dot heatmap**

```{r, fig.width=5, fig.height=25, fig.fullwidth=TRUE}
dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
                                      uni_statistics$colon[grepl(level,names(uni_statistics$colon))],
                                      colon_taxa_tab)
dotheatmap_linda
```

##### rPSC effect

**pre_LTx vs Healthy and Post_LTx vs Healthy intersection**


```{r}
A <- list_intersections[[paste(segment,level,"healthy vs rPSC")]]
B <- list_intersections[[paste(segment,level,"healthy vs non-rPSC")]]
df <- A[!(A$SeqID %in% B$SeqID),]

rpsc_effect[[paste(segment,level)]] <- df
  
# see the results
rpsc_effect[[paste(segment,level)]] 
```

#### Phylum level

```{r}
level="phylum"
```

```{r}
path_maaslin="../intermediate_files/maaslin/Q2/Phylum/"
```

```{r}
raw_linda_results_phylum[[segment]] <- list()
linda_results_phylum[[segment]] <- list()
```

Aggregate taxa

```{r}
phylum_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = "Phylum")

colon_phylum_tab <- phylum_data[[1]]
colon_phylum_taxa_tab <- phylum_data[[2]]

```

##### rPSC vs non-rPSC

```{r}
group <- c("non-rPSC","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

**linDA**

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}

# prepare the data
linda_data <- binomial_prep(colon_phylum_tab,
                            colon_phylum_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")

filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data, 
                   filt_colon_uni_metadata, 
                   formula = '~ Group * Country + (1|Patient)')

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}

```

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
                                taxa_table = filt_colon_uni_taxa) +
              ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                 taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Country effect")

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano
```

**MaAsLin2**

```{r, echo=FALSE,results='hide',message=FALSE,warning=FALSE}
fit_data = Maaslin2(
    input_data = filt_colon_uni_data, 
    input_metadata = filt_colon_uni_metadata, min_abundance = 0,
    min_prevalence = 0,min_variance = 0,
    output = file.path(path_maaslin,group1), max_significance = 0.05,
    fixed_effects = c('Group', 'Country'),random_effects = "Patient",
    correction = "BH")

```

Volcano plot

```{r}
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)

# see the results
volcano

```

**Group - Intersection**

```{r}
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                                                                      segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn
```

**Country - Union**

```{r, eval=FALSE}
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
```

**Interaction effect**

```{r}
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                                    filt_colon_uni_data,
                                                    filt_colon_uni_metadata,
                                                    segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

## results for czech cohort
list_interaction_significant[[2]]

## results for norwegian cohort
list_interaction_significant[[3]]
```

Removing problematic taxa

```{r}
list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
```

Basic statistics

```{r}
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
```

##### rPSC vs healthy

```{r}
group <- c("healthy","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

**linDA**

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}

# prepare the data
linda_data <- binomial_prep(colon_phylum_tab,
                            colon_phylum_taxa_tab,
                            colon_metadata,group, usage="linDA")

filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data, 
                   filt_colon_uni_metadata, 
                   formula = '~ Group * Country + (1|Patient)')

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
```

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
                                taxa_table = filt_colon_uni_taxa) +
              ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                 taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Country effect")

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano
```

**MaAsLin2**

```{r, echo=FALSE,results='hide',message=FALSE,warning=FALSE}
fit_data = Maaslin2(
    input_data = filt_colon_uni_data, 
    input_metadata = filt_colon_uni_metadata, min_abundance = 0,
    min_prevalence = 0,min_variance = 0,
    output = file.path(path_maaslin,group1), max_significance = 0.05,
    fixed_effects = c('Group', 'Country'),random_effects = "Patient",
    correction = "BH")

```

```{r}
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)

# see the results
volcano
```

**Group - Intersection**

```{r}
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                           segment=segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn
```

**Country - Union**

```{r, eval=FALSE}
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
```

**Interaction effect**

```{r}
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

## results for czech cohort
list_interaction_significant[[2]]

## results for norwegian cohort
list_interaction_significant[[3]]
```

Removing problematic taxa

```{r}
list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                               segment=segment,
                                                    level=level)
```

**Basic statistics**

```{r}
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
```

##### non-rPSC vs healthy

```{r}
group <- c("healthy","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

**linDA**

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}

# prepare the data
linda_data <- binomial_prep(colon_phylum_tab,
                            colon_phylum_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")

filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data, 
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
```

```{r, fig.width=15, fig.height=5, fig.fullwidth=TRUE}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
                                taxa_table = filt_colon_uni_taxa) +
              ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                 taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Country effect")

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) + 
              ggtitle("Interaction")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano
```

**MaAsLin2**

```{r, echo=FALSE,results='hide',message=FALSE,warning=FALSE}
fit_data = Maaslin2(
    input_data = filt_colon_uni_data, 
    input_metadata = filt_colon_uni_metadata, min_abundance = 0,
    min_prevalence = 0,min_variance = 0,
    output = file.path(path_maaslin,group1), max_significance = 0.05,
    fixed_effects = c('Group', 'Country'),random_effects = "Patient",
    correction = "BH")

```

```{r}
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano
```

**Group - Intersection**

```{r}
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn
```

**Country - Union**

```{r, eval=FALSE}
list_country_union <- country_union(group,linda.output, fit_data,
                                    segment=segment,
                                    level=level)
```

**Interaction effect**

```{r}
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

## results for czech cohort
list_interaction_significant[[2]]

## results for norwegian cohort
list_interaction_significant[[3]]
```

Removing problematic taxa

```{r}
list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
```

 Basic statistics

```{r}
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results[[segment]][[group1]],
                 by="SeqID",all=TRUE)

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- paste(level,comparison_name)
addWorksheet(supplements_wb, sheetName = new_name)
writeData(supplements_wb, sheet = new_name, uni_df, rowNames=FALSE)
```

##### Visualization

Heatmap visualizing the linDA's logFoldChange for taxa with p \< 0.1.

```{r, fig.width=10, fig.height=17, fig.fullwidth=TRUE}
list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,colon_taxa_tab)
p_heatmap_linda
```

**Dot heatmap**

```{r, fig.width=5, fig.height=25, fig.fullwidth=TRUE}
dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
                                      uni_statistics$colon[grepl(level,names(uni_statistics$colon))],
                                      colon_taxa_tab)
dotheatmap_linda
```

##### rPSC effect

**pre_LTx vs Healthy and Post_LTx vs Healthy intersection**


```{r}
A <- list_intersections[[paste(segment,level,"healthy vs rPSC")]]
B <- list_intersections[[paste(segment,level,"healthy vs non-rPSC")]]
df <- A[!(A$SeqID %in% B$SeqID),]

rpsc_effect[[paste(segment,level)]] <- df
  
# see the results
rpsc_effect[[paste(segment,level)]] 
```

#### Saving results

```{r}
# ALL DATA
saveWorkbook(supplements_wb,file.path(path,paste0("supplements_uni_analysis_wb_",segment,".xlsx")),overwrite = TRUE)

# PSC effect
write.xlsx(rpsc_effect,
          file.path(path,paste0("supplements_rpsc_effect_",segment,".xlsx")))

# SIGNIFICANT taxa
write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
            `names<-`(gsub(segment, "", names(
              list_intersections[grepl(segment,names(list_intersections))]))),
           file.path(path,paste0("supplements_significant_taxa_",segment,".xlsx")))
```

## Machine learning

```{r}
path = "../results/Q2/models"
```

### ElasticNet

```{r}
model="enet"
```

#### ASV level

```{r}
level="ASV"
```

##### rPSC vs non-rPSC

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
enet_model$conf_matrices
enet_model$plot
roc_c
```

##### rPSC vs healthy

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
enet_model$conf_matrices
enet_model$plot

roc_c
```

##### non-rPSC vs healthy

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
enet_model$conf_matrices
enet_model$plot

roc_c
```

##### rPSC effect

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_asv_tab,
                                                colon_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
enet_model$conf_matrices
enet_model$plot

roc_c
```

#### Genus level

```{r}
level="genus"
```

Aggregate taxa

```{r}
genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]
```

##### rPSC vs non-rPSC

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
enet_model$conf_matrices
enet_model$plot

roc_c
```

##### rPSC vs healthy

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
enet_model$conf_matrices
enet_model$plot

roc_c
```

##### rPSC vs healthy

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group",
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
enet_model$conf_matrices
enet_model$plot
roc_c
```

##### rPSC effect

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_genus_tab,
                                                colon_genus_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
enet_model$conf_matrices
enet_model$plot

roc_c
```

#### Saving results

```{r}
models_summ_df_colon <- do.call(rbind, 
  models_summ[grep(segment,names(models_summ),value = TRUE)])

write.csv(models_summ_df_colon,file.path(path,paste0("elastic_net_",segment,".csv")))
```

### Supplementary models

#### CLR-transformed data

##### kNN

```{r}
model="knn"
```

###### ASV level

```{r}
level="ASV"
```

***rPSC vs non-rPSC***

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)

# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)


# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)

# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs


# see the results
knn_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)

# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_asv_tab,
                                                colon_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")

# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

###### Genus level

```{r}
level="genus"
```

Aggregate taxa

```{r}
genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]
```

**rPSC vs non-rPSC**

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)

# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs


# see the results
knn_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)

# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs


# see the results
knn_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)

# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs


# see the results
knn_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_genus_tab,
                                                colon_genus_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")

# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
# save the results
supplements_models[["models_summ"]][["knn_model"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

##### Random Forest

```{r}
model="rf"
```

###### ASV level

```{r}
level="ASV"
```

***rPSC vs non-rPSC***

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)

# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)

# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)

# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_asv_tab,
                                                colon_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")

# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
roc_c
```


###### Genus level

```{r}
level="genus"
```

Aggregate taxa

```{r}
genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]
```

**rPSC vs non-rPSC**

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)

# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)

# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)

# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_genus_tab,
                                                colon_genus_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")

# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
roc_c
```

##### Gradient boosting

```{r}
model="gb"
```

###### ASV level

```{r}
level="ASV"
```

***rPSC vs non-rPSC***

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)

# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)

# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)

# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_asv_tab,
                                                colon_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")

# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
roc_c
```

###### Genus level

```{r}
level="genus"
```

Aggregate taxa

```{r}
genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]
```

**rPSC vs non-rPSC**

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)

# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)

# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)

# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_genus_tab,
                                                colon_genus_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_clr")

# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
roc_c
```

#### Relative abundances

##### Elastic net

###### ASV level

```{r}
level="ASV"
```

***rPSC vs non-rPSC***

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)


# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_asv_tab,
                                                colon_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
roc_c
```

###### Genus level

```{r}
level="genus"
```

Aggregate taxa

```{r}
genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]
```

**rPSC vs non-rPSC**

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)


# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_genus_tab,
                                                colon_genus_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
supplements_models[["models_summ"]][["enet_model_ra"]][[model_name]] <- enet_model$model_summary

supplements_models[["roc_cs"]][["enet_model_ra"]][[model_name]] <- enet_model$kfold_rocobjs

# see the results
enet_model$model_summary %>% t()
roc_c
```

##### kNN

###### ASV level

```{r}
level="ASV"
```

***rPSC vs non-rPSC***

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)

# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)

# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)

# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_asv_tab,
                                                colon_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")

# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

###### Genus level

```{r}
level="genus"
```

Aggregate taxa

```{r}
genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]
```

**rPSC vs non-rPSC**

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)

# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)

# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)

# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary

supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_genus_tab,
                                                colon_genus_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")

# fit the model
knn_model <- knn_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(knn_model, group)

# save the results
# save the results
supplements_models[["models_summ"]][["knn_model_ra"]][[model_name]] <- knn_model$model_summary
supplements_models[["roc_cs"]][["knn_model_ra"]][[model_name]] <- knn_model$kfold_rocobjs

# see the results
knn_model$model_summary %>% t()
roc_c
```

##### Random Forest

###### ASV level

```{r}
level="ASV"
```

***rPSC vs non-rPSC***

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)

# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)

# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)

# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_asv_tab,
                                                colon_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")

# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
roc_c
```

###### Genus level

```{r}
level="genus"
```

Aggregate taxa

```{r}
genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]
```

**rPSC vs non-rPSC**

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)

# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)

# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)

# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                        clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary

supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs


# see the results
rf_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_genus_tab,
                                                colon_genus_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")

# fit the model
rf_model <- rf_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(rf_model, group)

# save the results
supplements_models[["models_summ"]][["rf_model_ra"]][[model_name]] <- rf_model$model_summary
supplements_models[["roc_cs"]][["rf_model_ra"]][[model_name]] <- rf_model$kfold_rocobjs

# see the results
rf_model$model_summary %>% t()
roc_c
```

##### Gradient boosting

###### ASV level

```{r}
level="ASV"
```

***rPSC vs non-rPSC***

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)

# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)

# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_ra",
                                     patient = TRUE)

# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_asv_tab,
                                                colon_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")

# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
roc_c
```

###### Genus level

```{r}
level="genus"
```

Aggregate taxa

```{r}
genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]
```

**rPSC vs non-rPSC**

```{r}
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)

# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
roc_c
```

**rPSC vs healthy**

```{r}
group <- c("rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)

# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
roc_c
```

**non-rPSC vs healthy**

```{r}
group <- c("non-rPSC","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
```

```{r}
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_ra",
                                     patient = TRUE)

# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                          clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary

supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs

# see the results
gbm_model$model_summary %>% t()
roc_c
```

**rPSC effect**

```{r}
model_name <- paste("rPSC effect",level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_genus_tab,
                                                colon_genus_taxa_tab, 
                                                colon_metadata,
                                                rpsc_effect[[paste(segment,level)]],
                                                usage="ml_ra")

# fit the model
gbm_model <- gbm_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC curve
roc_c <- roc_curve(gbm_model, group)

# save the results
supplements_models[["models_summ"]][["gbm_model_ra"]][[model_name]] <- gbm_model$model_summary
supplements_models[["roc_cs"]][["gbm_model_ra"]][[model_name]] <- gbm_model$kfold_rocobjs


# see the results
gbm_model$model_summary %>% t()
roc_c
```

#### Saving results

```{r}
models_list <- list()

for (model_name in names(supplements_models$models_summ)){
  df <- do.call(rbind, supplements_models$models_summ[[model_name]])
  models_list[[model_name]] <- df
}

write.xlsx(models_list,
           file=file.path(path,paste0("supplements_models_",segment,".xlsx")),
           rowNames=TRUE)

```

## Results overview

#### Alpha diversity

```{r}
pc_observed[[segment]]
pc_shannon[[segment]]
pc_simpson[[segment]]
pc_pielou[[segment]]
```

Plots

```{r, fig.width=10, fig.height=4, fig.fullwidth=TRUE}
alpha_div_plots[[paste(segment,"Country")]]
```

```{r, fig.width=7, fig.height=4, fig.fullwidth=TRUE}
alpha_div_plots[[paste(segment,"Custom")]]
```

#### Beta diversity

**Main results**

```{r}
pairwise_aitchison_raw[[paste("genus", segment)]]
```

PCA

```{r, fig.width=5, fig.height=3, fig.fullwidth=TRUE, eval=TRUE}
pca_plots_list[[paste(segment,"genus custom")]]
```

**Supplements**

```{r}
knitr::kable(supplements_beta[!grepl("PCoA",names(supplements_beta))],
             digits = 3,
             caption = "Supplementary PERMANOVA results")
```

PCA

```{r, fig.width=8, fig.height=7, fig.fullwidth=TRUE, eval=TRUE}
plot_list <- supplements_beta[grepl("PCoA",names(supplements_beta)) &
                              grepl(segment,names(supplements_beta))]

ggarrange(plotlist = plot_list,
          labels=names(plot_list),
          font.label = list(size=5,face="plain"),
          ncol=2,nrow=3)
```

#### Univariate analysis

**Number of significant taxa**

```{r}
knitr::kable(cbind(as.data.frame(lapply(list_intersections,nrow)),
      as.data.frame(lapply(rpsc_effect,nrow))) %>% t() %>% 
  `colnames<-`("Count") %>% 
  `rownames<-`(c(names(list_intersections),"rPSC effect ASV","rPSC effect Genus","rPSC effect Phylum")),caption="Number of significant taxa")
```

**Counts**

```{r, eval=FALSE}
# univar_list <- univariate_statistics(list_intersections,
#                                      psc_effect,
#                                      ileum_genus_asv_taxa_tab)
# 
# univar_df <- univar_list[[1]]
# wb <- univar_list[[2]]
# 
# # save the results
# saveWorkbook(wb,"results/Q1/DAA_final_terminal_ileum.xlsx", overwrite = TRUE)
# 
# # see the results
# univar_df
```

#### Machine learning

**Main models**

Summary

```{r}
knitr::kable(models_summ_df_colon %>% dplyr::select(
"alpha","lambda",
"auc_optimism_corrected",
"auc_optimism_corrected_CIL",
"auc_optimism_corrected_CIU"),
             digits=3,caption="Elastic net results")
```

**ROC - ASV level**

```{r}
roc_curve_all_custom(roc_cs[c(9:12)], 
                     Q="Q2",
                     model_name="enet_model")
```

**ROC - Genus level**

```{r}
roc_curve_all_custom(roc_cs[c(13:16)],Q="Q2",
                     model_name="enet_model")
```

**Supplementary models**

Summary

```{r}
# Build final dataframe
models_list[["enet_model"]] <- rbind(models_summ_df_ileum,models_summ_df_colon)
final_df <- tibble(row_names = rownames(models_list[[1]]))

# Loop through models and extract required values
for (model_name in names(models_list)) {
  model_df <- models_list[[model_name]]
  
  # Combine AUC_optimism_corrected with its CI values
  final_df[[model_name]] <- paste0(
    round(model_df$auc_optimism_corrected, 3), 
    " (", round(model_df$auc_optimism_corrected_CIL, 3), "; ", 
    round(model_df$auc_optimism_corrected_CIU, 3), ")"
  )
}

knitr::kable(final_df, caption="All models")
write.csv(final_df,file=file.path(path,"AUC_all_models.csv"),row.names = FALSE)
```

ROC - ASV

```{r, fig.width=10, fig.height=8, fig.fullwidth=TRUE}
rocs_list <- supplements_models$roc_cs
rocs_list[["enet_model"]] <- roc_cs

plot_list <- list()

for (model_name in names(rocs_list)) {
  plot_list[[model_name]] <- roc_curve_all_custom(rocs_list[[model_name]][c(1:4)],
                       Q="Q2",
                       model_name=model_name)
}

ggarrange(plotlist = plot_list,labels = names(rocs_list),font.label = list(face="plain",size=7))
```

ROC - genus

```{r, fig.width=10, fig.height=8, fig.fullwidth=TRUE}
plot_list <- list()

for (model_name in names(rocs_list)) {
  plot_list[[model_name]] <- roc_curve_all_custom(rocs_list[[model_name]][c(5:8)],
                       Q="Q2",
                       model_name=model_name)
}

p <- ggarrange(plotlist = plot_list,labels = names(rocs_list),font.label = list(face="plain",size=7))
```

```{r,results='hide'}
pdf("../figures/Q2/models_colon.pdf",
    height =10,width = 10)
p
dev.off()
```

# Paper-ready visualizations

## Alpha diversity

```{r, fig.width=10, fig.height=4, fig.fullwidth=TRUE}
p_A <- alpha_div_plots$`terminal_ileum Country` +
  ggtitle("Terminal ileum")+
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15)) 

p_B <-  alpha_div_plots$`colon Country` +
  ggtitle("Colon") +
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15)) 

Q2_alpha <- ggarrange(p_A,ggplot() + theme_minimal(),p_B,nrow=1, ncol=3,
                      widths = c(1,0.1,1))
Q2_alpha
```

## Beta diversity


```{r, fig.width=10, fig.height=4, fig.fullwidth=TRUE}
pca_ti <- pca_plots_list$`terminal_ileum genus custom` 
pca_colon <- pca_plots_list$`colon genus custom` 

genus_Q2_beta <- ggarrange(pca_ti,
                           ggplot() + theme_minimal(),
                           pca_colon,ncol=3,
                           widths = c(1,0.1,1))
genus_Q2_beta
```

**Alpha + Beta diversity**

```{r, fig.width=10, fig.height=8, fig.fullwidth=TRUE}
alpha_beta <- ggarrange(Q2_alpha,genus_Q2_beta,
                        ncol = 1,nrow=2,labels = c("A","B"))
alpha_beta
```

## Elastic net

Genus level

```{r, fig.width=10, fig.height=4, fig.fullwidth=TRUE}
models_to_plot <- c("knn_model","rf_model","gbm_model","enet_model")
names(models_to_plot) <- c("kNN","RF","GBoost","ENet")

# ILEUM
plot_list_ileum <- list()
for (model_name in models_to_plot) {
  plot_list_ileum[[model_name]] <- 
    roc_curve_all_custom(rocs_list[[model_name]][c(5:8)],
                       Q="Q2",
                       model_name=model_name,legend = FALSE) + 
    ggtitle(names(models_to_plot)[which(model_name==models_to_plot)]) + 
    theme(plot.title = element_text(face = "bold",size = 8)) 
}

roc_curve_ti <- ggarrange(plotlist = plot_list_ileum)

# COLON

plot_list_colon <- list()
for (model_name in models_to_plot) {
  plot_list_colon[[model_name]] <- 
    roc_curve_all_custom(rocs_list[[model_name]][c(13:16)],
                       Q="Q2",
                       model_name=model_name,legend = FALSE) + 
    ggtitle(names(models_to_plot)[which(model_name==models_to_plot)]) + 
    theme(plot.title = element_text(face = "bold",size = 8))  
}

roc_curve_colon <- ggarrange(plotlist = plot_list_colon)


roc_curve_plot <- ggarrange(roc_curve_ti,
                            ggplot() + theme_minimal(),
                            roc_curve_colon,
                            ggplot() + theme_minimal(),
                            ncol=4, widths = c(1,0.1,1,0.1))
roc_curve_plot

```

```{r, fig.width=10, fig.height=12, fig.fullwidth=TRUE}
alpha_beta_elastic <- ggarrange(Q2_alpha,genus_Q2_beta,roc_curve_plot,
                        ncol = 1,nrow=3,labels = LETTERS,heights = c(1,1,1.2))
alpha_beta_elastic
```

```{r,results='hide'}
pdf("../figures/Q2/FIGURE4.pdf",paper = "a4",height = 10,width = 10)
alpha_beta_elastic
dev.off()
```

## Dot heatmap - DAA

```{r, fig.width=10, fig.height=13, fig.fullwidth=TRUE}
p_ileum <- dot_heatmap_ileum + 
  ggtitle("Terminal ileum") +
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15),
        legend.position = "none")


p_colon <- dot_heatmap_colon  +
  ggtitle("Colon") +
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15),
        legend.position = "none")

heatmap_plot <- ggarrange(p_ileum,
                          p_colon,
                          ncol = 2)
heatmap_plot
```

```{r,results='hide'}
pdf("../figures/Q2/FIGURE5.pdf",
    height =10,width = 8,paper="a4")
heatmap_plot
dev.off()
```

# Session info

```{r}
sessionInfo()
```


<!-- # Paper-ready visualizations -->

<!-- ## Alpha diversity -->

<!-- **Alpha diversity** -->

<!-- ```{r, fig.width=10, fig.height=4, fig.fullwidth=TRUE} -->
<!-- p_A <- alpha_div_plots$`Ileum Custom` +  -->
<!--   ggtitle("Terminal ileum")+  -->
<!--   theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15)) -->

<!-- p_B <-  alpha_div_plots$`Colon Custom` +  -->
<!--   ggtitle("Colon") +  -->
<!--   theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15)) -->

<!-- Q1_alpha <- ggarrange(p_A,ggplot() + theme_minimal(),p_B,nrow=1, ncol=3, -->
<!--                       widths = c(1,0.1,1)) -->
<!-- Q1_alpha -->
<!-- ``` -->

<!-- ## Beta diversity -->

<!-- ASV level -->

<!-- ```{r} -->

<!-- ``` -->

<!-- Genus level -->

<!-- ```{r, fig.width=10, fig.height=4, fig.fullwidth=TRUE} -->
<!-- pca_ti <- pca_plots_list$`Q1 Ileum Genus custom` + theme(legend.position = "none") -->
<!-- pca_colon <- pca_plots_list$`Q1 Colon Genus custom` + theme(legend.position = "none") -->

<!-- genus_Q1_beta <- ggarrange(pca_ti, -->
<!--                            ggplot() + theme_minimal(), -->
<!--                            pca_colon,ncol=3, -->
<!--                            widths = c(1,0.1,1)) -->
<!-- genus_Q1_beta -->
<!-- ``` -->

<!-- **Alpha + Beta diversity** -->

<!-- ```{r, fig.width=10, fig.height=8, fig.fullwidth=TRUE} -->
<!-- alpha_beta <- ggarrange(Q1_alpha,genus_Q1_beta, -->
<!--                         ncol = 1,nrow=2,labels = c("A","B")) -->
<!-- alpha_beta -->
<!-- ``` -->

<!-- ## Elastic net -->

<!-- ASV level -->

<!-- ```{r} -->

<!-- ``` -->

<!-- Genus level -->

<!-- ```{r, fig.width=10, fig.height=4, fig.fullwidth=TRUE} -->
<!-- roc_cs_ti <- roc_cs[4:6] -->
<!-- roc_cs_colon <- roc_cs[10:12] -->

<!-- roc_curve_ti <- roc_curve_all_custom(roc_cs_ti, Q = "Q2",legend = FALSE) -->

<!-- roc_curve_colon <- roc_curve_all_custom(roc_cs_colon, Q="Q2",legend=FALSE)  -->

<!-- roc_curve_plot <- ggarrange(ggplot() + theme_minimal(), -->
<!--                             roc_curve_ti, -->
<!--                             ggplot() + theme_minimal(), -->
<!--                             roc_curve_colon, -->
<!--                             ggplot() + theme_minimal(), -->
<!--                             ncol=5, widths = c(0.1,1,0.2,1,0.1), -->
<!--                             labels=c("","A","","B","")) -->

<!-- roc_curve_plot <- ggarrange(roc_curve_ti, -->
<!--                             ggplot() + theme_minimal(), -->
<!--                             roc_curve_colon, -->
<!--                             ggplot() + theme_minimal(), -->
<!--                             ncol=4, widths = c(1,0.1,1,0.1)) -->
<!-- roc_curve_plot -->

<!-- ``` -->

<!-- ```{r, fig.width=10, fig.height=12, fig.fullwidth=TRUE} -->
<!-- alpha_beta_elastic <- ggarrange(Q1_alpha,genus_Q1_beta,roc_curve_plot, -->
<!--                         ncol = 1,nrow=3,labels = LETTERS,heights = c(1,1,1)) -->
<!-- alpha_beta_elastic -->
<!-- ``` -->

<!-- ## Dot heatmap - DAA -->

<!-- ASV level -->

<!-- ```{r} -->

<!-- ``` -->

<!-- Genus level -->

<!-- ```{r, fig.width=10, fig.height=13, fig.fullwidth=TRUE} -->
<!-- list_for_heatmap_ileum <- c(list_intersections[c(4,5,6)]) -->
<!-- list_for_heatmap_colon <- c(list_intersections[c(10,11,12)]) -->

<!-- names(list_for_heatmap_colon) <- gsub("Genus","",gsub("colon","",names(list_for_heatmap_colon))) -->
<!-- names(list_for_heatmap_ileum) <- gsub("Genus","",gsub("Ileum","",names(list_for_heatmap_ileum))) -->

<!-- p_ileum <- dot_heatmap_linda(list_for_heatmap_ileum, -->
<!--                              ileum_taxa_tab) + xlab("") + ylab("") +  -->
<!--   ggtitle("Terminal ileum") +  -->
<!--   theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15), -->
<!--         legend.position = "none") -->


<!-- p_colon <- dot_heatmap_linda(list_for_heatmap_colon, -->
<!--                              colon_taxa_tab) + xlab("") + ylab("") + -->
<!--   ggtitle("Colon") +  -->
<!--   theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15), -->
<!--         legend.position = "none") -->

<!-- heatmap_plot <- ggarrange(p_ileum, -->
<!--                           p_colon, -->
<!--                           ncol = 2)  -->
<!-- heatmap_plot -->
<!-- ``` -->

<!-- ## All together -->

<!-- ```{r, fig.width=18, fig.height=10, fig.fullwidth=TRUE} -->

<!-- q1_plot <- ggarrange(alpha_beta_elastic, -->
<!--                      ggplot()+ theme_minimal(), -->
<!--                      heatmap_plot, -->
<!--                       ggplot()+ theme_minimal(), -->
<!--                      ncol=4,labels=c("","", "D",""),widths = c(1,0.2,0.8,0.15)) -->
<!-- q1_plot -->
<!-- ``` -->

<!-- ```{r, eval=FALSE} -->
<!-- pdf("clanok/obrazky/q1/Q1_plot_newalpha.pdf",height = 10,width = 17) -->
<!-- q1_plot -->
<!-- dev.off() -->
<!-- ``` -->

<!-- # Session info -->

<!-- ```{r} -->
<!-- sessionInfo() -->
<!-- ``` -->
